Inatnn  Untu^rattg 
(UnlltQt  of  lOtberal  ArtB 
IGtbrarg 

The  Gift  of  "fctl&  ....Ruthatr.  

378.744 


BOSTON  UNIVERSITY 
GRADUATE  SCHOOL 

I 

Thesis 


TEACHING-  HISTORY,   T  "ROUGH  TESTING,   IN  THE  HIGH  SCHOOL 


by 


Carl  Upton  Harvey 
(B.S.  in  Ed.  Boston  University,  1931) 


submitted  in  partial  fulfilment  of  the 
requirements  for  the  degree  of 
Easter  of  Arts 
1932 


BOSTON  UNfYERSITY 

COLLEGE  OF  LIBERAL  ARTS 
LIBRARY 


)  Li' 


4-^5  0+- 


A.M.  1932, 


0 


1 


CONTENTS 

Teaching  History,  through  Testing,  in  the  High  School 
Chapter  1 


Introductory 

1)  Need  for  the  study  of  measuring  knowledge  and 
achievement  in  history   2 

2)  The  principal  types  of  measurements  in  history 

in  the  modern  high  school   3 

Chapter  2 

Uses  of  History  Tests  in  the  High  School 

1)  History  tests  may  oe  used  for  examinations   6 

2)  History  tests  may  measure  a  teacher Ts  efficiency-7 

3)  Use  of  history  tests  to  give  pupils  objective 
standards   7 

4)  Diagnosis  is  a  worthy  use  of  history  tests   8 

5)  History  tests  help  in  placement  and  classifi- 
cation  9 

6)  History  tests  may  be  used  for  review  and  recall-  10 

7)  History  tests  enhance  the  intrinsic  worth  of 
learning   12 

8)  History  tests  may  measure  memory   14 

9)  History  tests  may  be  used  for  comparison   15 

10)  Use  of  history  tests  to  improve  teaching   15 

11)  Summary   17 


0 


2 

CONTENTS 

Chapter  3 

Criteria  for  the  Selection  of  History  Tests  in  the 
High  School 

1)  Criteria  to  be  considered   20 

2)  Validity   21 

3)  Reliability   26 

4)  Other  things  being  equal,  choose  a  test  that  has 
norms   32 

5)  Other  things  being  equal,  choose  a  test  having 
alternative  forms   33 

6)  Choose  a  test  because  it  is  scaled   35 

7)  Ease  of  administration   37 

8)  Other  things  being  equal,  choose  a  test  that  is 
inexpensive   39 

9)  Summary   41 

Chapter  4 

Teaching  History  through  the  Use  of  Oral  Questioning 

1)  The  uses  of  oral  questioning   44 

2)  Oral  versus  written  tests   45 

3)  Conclusion   47 

Chapter  5 

Teaching  History  through  the  Use  of  the  Traditional 
Examination 

1)  Nature  of  the  traditional  examination   49 


CONTENTS 
Chapter  5  (Continued) 

2)  Advantages  of  the  traditional  examination   51 

3)  Dis-advantages  of  the  traditional  examination  54 

4)  Conclusion  67 

Chapter  6 

Teaching  History  through  the  Use  of  the  Standard  Test 

1)  Growth  of  the  standard  test   70 

2)  Nature  and  purpose  of  the  standard  test  71 

3)  Additional  advantages  of  the  standard  test  76 

4)  Some  limitations  of  the  standard  test  77 

5)  Example  of  the  standard  test  79 

6)  Conclusion  82 


Chapter  7 

Teaching  History  through  the  Use  of  the  Objective  or 


New-Type  Examination 

1)  Nature  of  the  new-type  examination   85 

2)  Advantages  of  the  new-type  examination   86 

3)  Dis-advantages  of  the  new-type  examination   100 

4)  Five  most  generally  useful  of  the  new-type 
examinations-£eo?ra&  107 

5)  Conclusion  HO 


4 


CONTENTS 

Chapter  8 

Summary  113 

Appendix  123 


c 


I 


t 


1 


Teaching  History,  through  Testing,     in  the  High  School. 

Chapter  1 

Introductory 

1)  Need  for  the  study  of  measuring  knowledge  and 
achievement  in  history. 
2)     The  Principal  types  of  measurements  in  history 
in  the  modern  high  school. 


f 


2 


Teaching  History,  through  Testing,   in  the 

High  School. 

Chapter  1 
Introductory 

1 )  Heed  for  the  study  of  measuring  knowledge  and 
achievement  in  history  

The  value  of  educational  measurement  in  general  is 
no  longer  a  matter  of  doubt.   The  use  of  tests  and 
scales  in  thousands  of  schools  has  shown  very  clearly 
that  from  the  results  of  testing  there  have  come  in- 
estimable benefits  to  school  administrators  and  teachers, 
to  pupils  in  all  grades  and  of  all  agesm  and  to  the 
community  at  large.  The  school  system  in  which  educa- 
tional measurement  is  widely  used  is  likely  to  be 
markedly  superior  to  the  one  in  which  there  is  no 
testing  program. 

The  past  quarter  of  a  century  has  witnessed  the  rise 
of  educational  measurement  to  the  plane  of  conscious 
striving  for  objective,   impartial  and  comparative 
means  for  portraying  the  absolute  and  relative  achieve- 
ments of  pupils.  The  measurement  of  achievement  has  been 
admittedly  the  principal  reason  for  examinations.  This 
idea  is  undoubtedly  sound.   It  may  require  certain 


v- 


3 


qualification,  but  ail  seen:  to  agree  that  the  first 
purpose  of  a  test  or  examination  is  that  of  ascer- 
taining the  degree  to  which  individual  pupils  have 
profited  "by  instruction. 

It  is  plain  that,  although  scientific  measurement 
is  still  a  very  new  thing  in  education,  it  has  passed 
the  experimental  stage.   It  has  "become  so  important 
that  no  teacher  can  be  said  to  have  had  complete  pro- 
fessional training  without  some  knowledge  of  this 
subject.  Thus  the  teacher  who  is  thoroughly  equipped 
for  her  position  must  know  something  of  the  aims, 
methods,  materials,  results,  and  general  educational 
implications  of  this  important  work. 

2)  The  Principal  types  of  measurements  in  history 
in  the  modern  high  school   

Four  types  of  measurements  in  history  exist  side  by 
side  in  the  modern  high  school.  These  are: 

1.  Oral  questioning. 

2.  The  traditional  examination. 
3.     The  standard  test. 

4.  The  objective  or  new-type  examination. 
There  are  other  means  of  evaluating  school  results, 
but  the  four  types  mentioned  are  the  most  important. 
•*ith  such  a  variety  of  methods  open  to  the  educator, 
and  with  so  little  of  a  final  character  known  about 


their  relative  merits,  the  only  course  open  to  us  i 
to  consider  both  the  logic  and.  the  growing  body  of 
experimental  findings  supporting  or  undermining  the 
value  of  each.  This  is  the  task  of  this  thesis  in 
its  entirety. 


• 


m 


Chapter  2 

Uses  of  History  Tests  in  the  High  School 


1)  History  tests  may  be  used,  for  examinations. 

2)  History  tests  may  measure  a  teacher's  efficiency. 

3)  Use  of  history  tests  to  give  pupils  objective 
standards. 

4)  Diagnosis  is  a  worthy  use  of  history  tests. 

5)  History  tests  help  in  placement  and  classification. 

6)  History  tests  may  oe  used  for  review  and  recall. 

7)  History  tests  enhance  the  intrinsic  worth  of 
learning. 

8)  History  tests  may  measure  memory. 

9)  History  tests  may  be  used  for  comparison. 

10)  Use  of  history  tests  to  improve  teaching. 

11)  Summary. 


r 


Chapter  2 

12  3 

Uses  of  History  Tests  in  the  High  School     '  * 

1)  History  tests  may  "be  used  for  examinations  

One  of  the  widest  uses  which  all  forms  of  test- 
inghave  had  in  the  past,  and  one  of  their  more 
important  uses  in  the  future,  lies  in  the  field  of 
examinations.   In  this  sense,  tests  are  used  to  find 
out  the  status  of  an  individual  pupil  at  any  given 
time  from  the  p  oint  of  view  of  his  achievement.  It 
may  "be  that  the  primary  purpose  of  examining  is  to 
determine  the  fitness  of  a  pupil  for  promotion.  As 
long  as  promotion  to  a  higher  grade  depends  largely 
on  the  academic  fitness  of  the  individual  for  such 
promotion,  -  just  so  long  is  this  method  of  examin- 
ing for  promotion  just  and  right.  The  great  need, 
however,  is  to  make  sure  that  the  examinations 
which  are  used  for  this  purpose  really  measure  what 
is  wanted,  and  do  measure  real  achievement. 

Examinations  to  determine  the  status  of  pupils 
who  have  just  entered  a  school  are  also  much  needed. 
This,  again,  is  a  form  of  determination  of  the 
status  of  a  pupil,  not  primarily  as  a  check  after 
teaching  has  been  done,  but  rather  to  find  out  where 
further  teaching  should  begin. 

In  these  forms  of  examinations,  tests  can  be  used 

1  -  From  Symonds,      Measurement  in  Secondary  Education 

pp.  1-2 

2  -  From  Russell,       Classroom  Tests.        pp.  4-11 

3-    From  Ruch  and  Stoddard,  Tests  and  measurements  in 

High  School  Instruction    pp. 6-44 


r 


to  advantage  in  securing  not  only  useful  results 
but  results  which  are  justified. 

2)  History  tests  may  measure  a  teacher's  efficiency 
History  tests  may  also  he  used  as  a  measure  of  a 

teacher's  efficiency.   It  is  true  we  cannot  say  that 
a  teacher  has  any  definite  percentage  of  teaching 
efficiency,  as  we  can  measure  the  development  of 
horse  power  in  a  gasoline  engine,  and  it  would  pro- 
bably be  undesirable  if  we  could.  We  can,  however, 
through  the  analysis  of  a  series  of  tests,  say  that 
a  teacher  is  doing  better  with  his  pupils  than  teachers 
of  similar  children  elsewhere,  or  as  well,  or  more 
poorly.  This,  then,  may  become  for  the  individual 
teacher  a  potentfactor  in  helping  him  to  appreciate 
where  improvements  in  his  teaching  can  be  made, and 
whereeffort  should  be  best  placed.  Without  necessarily 
knowing  the  absolute  amount  of  efficiency  with  which 
a  teacher  teaches,  it  becomes  possible  in  this  way 
to  increase  efficiency  by  preventing  unnecessary 
repetitions  of  teaching,  or  by  providing  more 
clearly  for  teaching  needs. 

3)  Use  of  history  tests  to  give  pupils  objective 
standards . 

A  further  use  of  tests  is  one  which  tends  to  give 
to  pupils  some  objective  standard  by  which  to  judge 


i 


the  character  or  the  quality  of  the  work  which  they 
have  done. 

Pupils  immersed  in  the  details  of  school  study 
frequently  find  it  difficult  to  appreciate  the 
objectives  of  that  study.  For  that  reason  they  have 
little,  save  their  own  interest  or  inclination,  by 
which  to  judge  the  relative  importance  of  varying 
phases  of  a  subject  as  they  arise.   Tests  may  furnish 
such  objectives,   (or  at  least  a  high  grade  of  sub- 
stitute for  such  objectives),  by  helping  students 
to  see  a  relative  value,  and  to  appreciate  the 
necessity  for  accurate  and  complete  knowledge. 

In  another  way  the  same  object  may  be  accomplished 
through  the  fact  that  the  tests  may  provide  a  motive 
for  the  study.   If  the  motive  is  not  that  of  passing 
the  subject,  it  is  probable  that  it  is  worthy  and 
capable  of  furnishing  a  worthy  objective.  Objective 
standards  and  objective  goals  are  superior  to  non- 
objective  standards  and  unknown  goals. 

4 )  Diagnosis  is  a  worthy  use  of  history  tests  

In  addition  to  these  uses,  however,  one  of  the 
worthy  uses  to  which  testing  can  be  put  lies  in 
diagnosis.  This  is  a  field  as  broad  as  that  of  the 
school  itself,  but  one  to  which,  until  the  present, 
there  has  been  but  little  attention  paid. 

Diagnosis  of  the  difficulties  in  the  learning 
process,  of  the  difficulties  for  individual  pupils 


9 


in  the  varying  types  of  subject  matter,  and  in  the 
various  phases  of  the  school  curriculum,-  all  this 
is  one  of  the  widest  and  most  important  fields  of 
endeavor  for  the  teacher. 

Diagnosis  is  a  field  of  constant  inquiry,  a  field 
of  highly  focused  endeavor,  and  a  field  of  rapid 
change.  Here  it  is  that  the  teacher  has  to  do  with 
the  kaleidoscopic  changes  of  tastes  and  perception, 
of  likes  and  dislikes,  of  feelings  and  emotions. 
Here  it  is  that  he  has  to  work  with  the  individual 
differences  of  his  pupils  and  with  their  inherent 
abilities.  No  adequate  method  by  which  these 
differences  could  be  measured,  no  fair  treatment 
of  the  pupils  of  a  class,  was  possible,  until  tests 
and  the  method  of  measurement  as  we  know  it  today 
were  developed. 

The  diagnosis  of  individual  difficulties  and  the 
remedial  measures  which  must  be  taken  to  counteract 
these  difficulties  lie  within  the  province  of  the 
teacher.  They  can  be  accomplished  only  through  the 
medium  of  adequate  testing. 

5)  History  tests  help  in  placement  and  classification 

Another  of  the  uses  of  history  tests  which  is  being 
rapidly  developed  at  the  present  time  is  for  the  pro- 
per placement  of  new  pupils  in  a  school,  and  the  re- 
classification of  pupils  already  there. 


10 


On  the  other  hand,  pupils  coming  into  a  new  school- 
district,  or  into  a  new  school  system,  may  have  "been 
taught  by  different  standards,  or  possibly  with  a 
different  curriculum,  from  the  pupils  in  the  schools 
which  they  are  entering.  History  tests  can  be  used 
to  advantage  with  these  prospective  pupils  to  deter- 
mine notonly  their  absolute  achievement,  but  also 
their  aoilities  in  relation  to  the  other  pupils  in 
the  school.   The  results  of  these  tests  provides  justi- 
fication for  making  whatever  placements  are  made. 

On  the  other  hand,  many  pupils,  under  the  former 
basis  of  promotion,  were  frequently  misplaced  several 
grades  although  they  v/ere  more  often  placed  in  grades 
below  their  actual  school  level  than  above  it.  In 
this  case,  a  ba.tery  of  tests  administered  throughout 
a  school  may  be  an  objective  determiner  of  the  status 
of  all  pupils  in  the  school,  and  it  may  also  be  a 
valid  reason  for  changing  the  grade  classification  of 
many  of  them. 

6)  History  tests  may  be  used  for  review  and  recall- 
Oue  of  the  v/ider  uses  of  testing  is  that  of  testing 

for  purposes  of  review,  or  to  help  in  the  process  of 

recall. 

Psychologists  have  made  i.any  studies  of  the  rate 
at  which  various  subjects  and  skills  are  forgotten 


11 


aiter  they  have  once  "been  learned.   In  conformance  with 
the  "Laws  of  Learning"  it  has  Deen  found  that,  without 
exception,  whatever  is  learned  tends  to  deteriorate, 
other  things  being  equal,  with  disuse,   but  that  also 
each  review  or  recall  makes  the  learnedbonds  stronger 
and  stronger,  until,  with  enough  repetition  of  the 
right  kind,  the  bonds  may  become  so  well  established 
above  the  threshold  of  memory  as  to  be  relatively 
permanent. 

Uhaer  such  conditions  it  would  seem  to  be  one  of 
the  great  aiu:s  of  the  teacher  not  only  to  teach,  but 
also  to  reteach  so  as  to  increase  the  retention  of 
the  learned  elements.  Some  types  of  teacher's  tests 
help  to  make  the  subject  matter  so  vivid  and  so  full 
of  interest  that  it  may  not  only  be  retained  but  also 
be  retained  in  a  psychologically  desirable  way. 

In  addition  to  retention,  hov/ever,  there  are  also 
other  purposes  in  a  review  which  are  important  in 
teaching,  and  which  can  be  aided  through  testing. One 
of  these  consists  of  organization  and  re-organization. 
It  seems  true,  that  not  only  should  things  be  learned 
in  one  way,  but  they  frequently  should  be  learned  in 
others  as  well,  if  they  are  to  be  really  useful. 
)  Learning  things  in  other  ways'  means  merely  ..aking 

different  application  of  them  or  of  finding  new  rela- 


12 


tionships  to  them.  This  is  re-organization.  Testing 
can  be  used  to  great  advantage  for  achieving  this 
purpose. 

7 )  History  tests  enchance  the  intrinsic  worth  of 

learning   Another  use  of  history  tests  is 

one  which  has  yet  had  hut  little  attention,  but  which 
is  rich  in  possibilities  for  both  the  present  and  the 
future.  It  helps  to  realize  one  of  the  great  ideals 
of  education,  namely,  to  enhance  for  this  and  for 
future  generations  the  intrinsic  worth  of  learning. 

At  a  time  when  the  efforts  of  educators  are  focused 
upon  an  ideal  of  worthwhile  activity  on  the  part  of 
pupils  in  school,  when  the  curriculum  is  being  scanned 
to  remove  traces  of  artificiality,  when  courses  of 
study  are  being  pla  ned  to  eliminate  subjects  or  parts 
of  subjects  that  are  included  largely  because  of  pre- 
judiced tradition,  and  at  a  time  when  the  work  of  the 
pupil  in  school  is  directed  toward  making  his  life 
there  more  rich  and  more  like  life  outside  of  school, 
any  plan  which  will  enhance  the  intrinsic  worth  of 
learning  to  the  pupil  is  a  step  toward  the  realization 
of  those  ideals. 

Some  learning  has  passed  from  the  stage  of  co- 
ercion through  the  Stage  of  reward,  past  the  sta£:e  of 
rivalry,  and  is  now  founded  on  the  worth  of  learning 


13 


for  its  own  sake.  Tests  rightly  used,  emphasizing  the 
worth  of  learning,  the  desirableness  of  knowledge, 
especially  in  terms  of  its  usability,  are  means  to 
bring  more  of  the  materials  of  education  to  this 
level. 

We  can  see  the  reasons  for  the  ineffectiveness  of 
rewards  and  sugar-coating  as  a  basis  for  right  learn- 
ing. 7/e  can  see  the  unwhuleso  ..e  ideals  connected  with 
emphasizing  the  rivalry  of  one's  fellows.   It  should 
be  clear  that  it  is  in  rivalry  v/ith  one's  best  pre- 
vious efforts,  of  oneself,  that  real  education  re- 
sults. 

In  daily  living,  rivalry  with  one's  fellows  seems, 
from  a  superficial  point  of  view,  to  be  the  paramount 
motive  for  success.  But  one  needs  only  to  examine  the 
conspicuous  cases  of  success  in  his  own  environment 
to  discover  that  these  are  not  merely  the  result  of 
a  selfish  rivalry  -  ith  the  success  of  others.  The 
conspicuously  successful  physician,  lawyer,  pastor, 
or  teacher  has  no  rival  save  himself.   If  this  is  the 
criterion  for  success  in  life,   it  should  be  the  cri- 
terion for  success  in  school. 

Tests  rightly  used  and  rightly  interpreted  furnish 
a  means  by  which  pupils  can  rival  their  own  best  pre- 
vious efforts.   It  is  a  means  of  promoting  one  of  the 


14 


highest  types  of  social  education. 

8 )  History  tests  may  measure  memory  

History  tests  may  be  used  to  measure  the  memory  of 
pupils.  There  may  he  several  types  of  memory  tested. 
The  type  of  test  most  frequently  used  in  school  is 
that  which  determines  the  amount  of  material  an  in- 
dividual has  retained.  John  has  been  studying  about 
Africa.  The  test  is  one  so  designed  as  to  determine 
how  much  John  still  remembers  of  what  he  was  supposed 
to  have  learned  aoout  Africa.  This  is  a  good  test  of 
memory  of  pure  knowledge;  but,  with  the  emphasis  which 
is  now  being  placed  upon  other  types  of  memory  and 
upon  other  attributes  of  knowledge,  the  test  is,  in 
this  respect,  somewhat  too  narrow. 

It  is  frequently  good  to  rememeber  things  not  as 
separate  facts  but  as  related  facts.  In  this  case  some 
other  type  of  test  is  needed:   one  which  not  only 
measures  a  knowledge  of  the  facts  themselves  but  also 
an  additional  knowledge  of  their  relation  to  one  an- 
other. 

It  may  also  be  desirable  on  occasion  merely  to  have 
a  memory  which  recognizea  a  fact,  without  necessarily 
having  the  ability  to  recall  the  fact,  unassisted.  To 
recognize  the  truth  of  a  fact,  with  the  fact  oat  of 
the  context  in  which  it  may  be  usually  found,  consti- 
tutes a  universal  need  of  daily  life  and  may  on  occasion 


prove  as  valuable  as  any  other  form  of  memory* 

The  same  may  "be  said  of  the  order  of  a  series  of 

facts.  It  is  often  as  necessary  for  an  individual  to 

know  the  order  of  a  series  of  facts,  as  to  have  the 

memory  for  the  facts  themselves. 

All  these  types  of  memory  should  be  used  in  school, 

and  we  should  encourage  them,  in  addition  merely  to 

stimulating  the  remembrance  of  facts  supposedly 

"memorized11  or  "learned" . 

9)  History  tests  may  be  used  for  comparison-  

One  of  the  resul  s  of  the  testing  v/hich  is  now 

being  done  in  the  schools  is  that  a  teacher  may  com- 
pare his  class  and  the  work  of  its  individual  members 
with  like  pupils  in  like  classes  in  other  places.  In 
a  nation  as  large  as  this  -  a  nation  where  the  ideals 
of  education  are  (in  their  more  fundamental  aspects 
at  least)  so  universal  in  all  sections,  and  where  the 
character  of  the  sections  and  the  character  of  the 
people  differ  to  such  an  extent  -  it  is  a  great  value 
that  a  teacher  should  know  what  may  be  the  standards 
in  other  parts  of  the  country,  and  to  be  able  to  com- 
pare his  standards  and  his  pupils  with  these.  Tests 
may  furnish  the  only  reliable  means  by  which  this  can 
be  o.one. 

10)  Use  of  history  tests  to  improve  teaching  

History  tests  may  be  used  in  several  ways  to  improve 


r 


c 


16 


teaching.  One  way  may  be  considered  in  its  relation 
to  the  pupil.  Tests  furnish  for  the  teacher  a  more 
complete  knowledge  of  his  pupils.  Through  testing  he 
can  discover  the  individual  status  of  the  pupils,  to- 
gether with  their  individual  difficulties  and  misunder- 
standings. He  can  also  discover  the  difficulties  and 
misunderstandings  that  are  characteristic  of  like 
pupils.  When  the  teacher  anticipates  these  difficulties 
in  his  teaching,  he  is  improving  it. 

A  second  v/ay  may  be  in  the  subject  matter  itself. 
Tests  may  be  used  to  increase  the  value  of  materials 
already  in  use  by  giving  to  them  wider  application,  or 
greater  implication,  and  they  may  also  be  used  as 
guides  to  needed  or  desirable  extensions  of  these 
materials.  Either  of  these  uses  of  tests  should  result 
in  jmproved  teaching. 

A  third  v/ay  of  improving  teaching  may  be  in  method. 
History  tests  furnish  objective  results  of  the  methods 
by  which  teaching  is  accomplished.  These  objective 
results  are  not  alone  a  measure  of  the  pupil  and  his 
accomplishment ;  they  are  also  a  measure  of  the  effec- 
tiveness of  the  emthod  which  the  teacher  has  used. 
When  varying  methods  are  contrasted  in  terms  of  the 
results  secured,  and  the  better  methods  are  chosen  for 
future  use,  better  teaching  is  one  result. 


c 


17 


A  fourth  way  in  which  teaching  may  be  improved  is 
through  the  teacher  himself.  Here  the  improvement  may 
result  from  the  wider  knowledge  which  the  testing 
makes  necessary,  or  from  the  greater  skill  and  con- 
fidence in  the  teaching  which  the  tests  make  possible, 
or,  most  important  of  all,  from  the  added  stimulation 
to  constructive  thinking  which  accompanies  the  testing. 

11)  Su.;.mary  

History  tests  are  being  used  in  many  ways  besides 
merely  for  the  purpose  of  examination  and  promotion. 
They  may  be  used  to  test  the  efficiency  of  a  teacher, 
or  to  examine  pupils  for  purposes  of  locating  beginning 
points  in  teaching,  or  for  determining  school  status. 
They  may  promote  review  and  recall  either  by  increasing 
the  retentiveness  of  pupils,   or  by  organizing  and  re- 
organizing the  work  that  has  been  covered.  They  may 
be  used  for  placement  of  pupils,  or  for  classification. 
They  may  be  given  in  order  to  diagnose  difficulties  of 
pupils,  and  they  may  in  that  way  provide  a  basis  for 
remedial  measures.  They  i^ay  be  used  so  as  to  compare 
a  class  or  a  pupil  in  one  part  of  the  nation  with 
another  in  a  different  part,  or  with  the  composite 
pupil  of  all  parts.  They  may  be  used  for  the  moti- 
vation of  school  work,  for  the  promotion  of  real  and 
not  artificial  interest,  bringing  with  t.iat  the  hand- 


c 


c 
r 


maiden  of  interest  -  active  attention.  They  may  pro- 
vide pupils  with  objective  goals  for  school  study. 
Llost  important  of  all,  +-o  the  extent  to  which  heir 
influence  may  be  felt,  history  tests  may  improve 
teaching. 


c 
f 


r 


It 


c 


c 
f 


19 


Chapter  3 

Criteria  for  the  Selection  of  History  Tests  in  the 
High  School. 

1)  Criteria  to  oe  considered. 

2)  Validity. 

3)  Reliability. 

4)  Other  things  "being  equal,  choose  a  test  that  has 
norms. 

5)  Other  things  being  equal,  choose  a  test  having 
alternative  forms. 

6)  Choose  a  test  because  it  is  sealed. 

7)  Ease  of  administration. 

8)  Other  things  being  e^ual,  choose  a  test  that  is 
inexpensive. 

9)  Summary. 


c 


c 

c 


20 


Chapter  5 

Criteria  for  the  Selection  of  History  Tests  in  the 

High  School  -1-'2* 

1)  Criteria  to  he  considered  

It  is  the  purpose  of  this  chapter  to  present  a  list 
of  criteria  which  should  he  employed  in  selecting 
standardized  history  tests,  and  to  discuss  and  ex- 
plain the  meaning  and  application  of  each.  Several 
of  the  sar-e  criteria  are  also  suitable  for  use  in 
connection  with  teacher-made,  or  unstandardized 
tests,  but  others  of  them  have  no  application  in 
connection  with  such  tests.  The  reader  will  have  no 
difficulty  in  recognizing  which  are  applicable  to 
other  tests  and  which  are  not,  since  the  discussion, 
without  saying  so,  explicitly  will  make  clear  that 
certain  of  them  cannot  be  expected  to  apply  to 
ordinary  classroom  tests. 

For  many  workers  in  educational  measurements  it 
is  more  worthwhile  to  be  able  to  recognize  a  good 
test  by  its  earmarks  than  to  know  intimately  the 
many  tests  -  good,  mediocre,  and  poor  -  that  exist 
today. 

The  chief  criteria  or  bases  of  selection,   some  of 
which  may  be  subdivided,  are  as  follows: 

1.  From  Ruch  and  Stoddard,  Tests  and  Measurements  in 

High  School  Instruction    pp. 45-68 

2.  From  Symonds ,Pervical  A. A.  Measurement  in  Secondary 

Education  pp. 278-307 


21 


I  Validity 
II  Reliability 

(a)  Objectivity 

III  Norms 

IV  Duplicate  and.  equivalent  forms 
V  Scaling 

VI      Ease  of  Administration 
VII  Cost 

These  are  arranged  in  approximate  order  of  impor- 
tance with  regard  to  the  real  merit  of  tests.  From 
the  more  practical  standpoint  of  selecting  history 
tests  which  it  is  possible  or  practicable  to  employ, 
the  last  two  items  are  often  of  greater  importance 
than  their  position  indicates. 

2)  Validity  

The  first  concern  of  one  who  wishes  to  choose  a 

standardized  history  test  is  to  make  sure  that  the 

test  really  mesaures  what  it  purports  to  measure. 
1 

(Wood  emphasizes  validity  as  the  fundamental  desi- 
deratum in  test  construction).  The  adequacy  and  detail 
with  which  a  test  is  a  measure  of  a  trait,  function, 
or  school  subject,  is  called  its  validity.  Validity 
is  measured  or  determined  by  the  correlation  of 
scores  on  the  test  with  some  independent  criterion 
of  the  school  subject  in  question.  The  criterion  in 
the  case  of  history  is  some  other  measure  of  the  sub- 
ject which  has  no  reference  to  the  test  whose  validity 
is  being  considered. 

In  the  case  of  achievement  tests  the  independent 


1  Wood,  Ben.D.  "Studies  of  Achievement  Tests," 

Journal  of  Educational  Psychology, 
17:     125-129.     Feb. 1926 


c 
c 


c 
c 


22 


criteria  to  be  used  for  validation  are  few  in  number, 
One  must  usually  fall  back  upon  school  marks  or 
teachers'  estimates  of  achievement  as  a  criterion. 
Often  v/here  several  tests  have  been  constructed  in 
history,  it  is  possible  to  determine  the  relative 
validity  of  the  separate  tests  by  constructing  the 
composite  of  all  the  tests,  and  determining  the 
correlation  of  each  test  with  this  composite. 

In  judging  the  validity  of  a  test  one  should  in- 
vestigate how  the  material  of  the  test  was  assembled. 
Good  standardized  tests  usually  contain  statements, 
either  in  the  manual  of  directions  or  in  articles 
in  technical  periodicals,  describing  how  the  material 
for  the  test  was  hathered.  One  should  search  for 
such  a  statement,  and,   if  none  is  to  be  found,  he 
may  immediately  suspect  the  care  used  in  construct- 
ing the  test.   If  there  is  no  description  as  to  how 
the  material  was  assembled,   one  may  suppose  that  it 
was  hastily  collected  much  as  a  teacher  collects 
material  for  an  informal  test.  With  material  gathered 
in  this  way,  a  test  is  almost  sure  to  be  short-lived 
and  doomed  to  disappear  because  of  its  inadequate 
sampling,  or  because  of  its  irregular  emphasis  of 
material. 

In  the  case  of  achievement  tests  most  of  the 


S3 


validation  of  the  test  must  "be  accomplished  in  the 
original  choice  of  material  of  the  test.  Concerning 
the  principles  of  the  choice  of  material  there  is 
much  confusion,  particularly  in  subjects  which  are 
now  in  a  period  of  development  or  transition.  Should 
the  test  attempt  to  measure  some  typical  course  as 
represented  Dy  average  textbooks  or  examinations 
used  in  the  country,  or  should  material  represent- 
ing progressive  experiemntal  courses  be  selected? 
If  standardized  tests  were  used  solely  as  measure- 
ing  instruments ,  probably  the  former  type  would  be 
more  desirable,  but  the  truth  is  that  standardized 
tests  also  set  up  in  the  minds  of  teachers  stan- 
dards for  the  selection  of  material  and  instruc- 
tion. For  this  reason,  they  ought  to  represent,  in 
some  degree,  forward-looking  tendencies  in  curricu- 
lum development.  Certain  principles  concerning  the 
choice  of  materials  may  be  safely  given.  Five  dis- 
tinct methods  or  criteria  have  been  employed  in 
the  choice  of  material  for  standardized  tests  in 
hsitory. 

1.  Criterion  of  Extrinsic  Use 
This  means  that  the  choice  of  material  is  based 
upon  its  usefulness.  This  criterion,  however,  is 
difficult  to  employ  unless  the  studies  of  frequency 


24 


of  use  have  already  been  made,  for  it  is  usually 
more  difficult  and  tedious  to  make  the  studies  than 
to  construct  a  standardized  test 

2.  Criterion  of  Composite  of  Textbooks. 
One  of  the  most  convenient  and  practical  of  the 

methods  for  selecting  the  material  for  a  history 
test  is  to  use  a  composite  of  what  is  given  in  several 
textbooks  in  the  subject.  This  means  that  the  selec- 
tion of  the  items  in  a  history  test  has  been  based, 
somewhat,  upon  an  analysis  of  the  contents  of  several 
textbooks  in  the  subject,  and  the  construction  of  the 
test  embodies  the  best  of  the  materials  actually 
occurring  in  the  books  which  must  be  used  in  history 
glasses. 

3.  Composite  of  Requirements  in  course  of  Study 
Very  similar  to  the  use  of  textbooks  is  the  use  of 

requirements  in  courses  of  history.  This  means  that 
the  selection  of  the  items  in  a  history  test  has  been 
based,  somewhat,  upon  an  analysis  of  the  requirements 
of  several  courses  in  the  subject,  and  the  construc- 
tion of  the  test  embodies  the  most  generally  required 
materials. in  the  several  courses. 

4.  Composite  of  Teachers1  Examinations. 
TMs  criterion  has  been  used  effectively  in  some 

history  tests.  This  means  that  the  selection  and  vali- 
dation of  the  items  in  a  history  test  have  been  based, 


c 


P 


85 


somewhat,  on  the  analysis  and  condensation  of  many- 
final  examinations  used  in  all  parts  of  the  country 
over  a  period  of  several  years. 

5.  Judgment  of  Experts. 

In  all  cases,  as  a  last  resort,  we  go  back  to  the 
judgment  of  experts.  Perhaps  we  should  not  say  a  last 
resort,  because  the  judgment  of  experts  has  been  the 
decisive  factor  in  determining  the  content  of  text- 
books and  courses  of  study. 

Selection  of  Test  Items. 

Lakers  of  our  best  standardized  history  tests  have 
usually  assembled  at  the  start  several  times  as  many 
items  or  elements  as  it  was  their  intention  to  keep 
permanently.  It  is  generally  recognized    today  that 
a  standardized  h.istory  test  should  have  at  least  two, 
preferably  more,  duplicate  f orms ,  so  that  the  test 
may  have  its  greatest  usefulness  in  actual  classroom 
practice.  Often  a  teacher  wishes  to  re-test  her  class 
soon  after  a  first  test  is  ^iven.   It  is  not  desirable 
to  repeat  the  identical  test,  because  certain  of  the 
items  might  be  remembered,  and,  perhaps,  especially 
looked  up  in  the  interval.   It  may  be  affirmed  here 
that  there  is  less  danger  of  this  than  is  commonly 
supposed.  However,  it  is  usually  possible,  and  hence 
desirable,  to  prepare  and  use  tests  identical  in  form 


c 


26 


"but  differing  in  content  from  the  original  test. 

The  other  reason  for  preparing  many  more  test  items 
at  the  start  than  it  is  intended  to  ultimately  use  in  the 
test  is  in  order  to  discard  items  which  seem  to  have 
little  validity  for  the  purposes  of  the  test. 

Text hooks,  courses  of  study,  and  teachers'  examina- 
tions have  "been  used  to  determine  the  most  valid 
elements. 

3)  Reliability  

The  second  most  important  criterion  of  a  history 
test  is  its  reliability.   Indeed  it  is  perhaps  as  hard 
to  construct  tests  with  the  desired  reliability  as 
with  high  validity.   If  the  testing  time  is  of  necessity 
brief,  give  prime  consideration  to  reliability  of  the 
test,  and,  if  the  testing  time  is  long,  give  prime 
consideration  to  validity,  and  "but  secondary  attention 
to  the  reliability  of  the  test.  To-day  the  maker  of 
a  good  test  reports  the  reliability  of  his  instrument 
in  the  manual  of  directions  accompanying  the  test,  or 
in  some  article  of  a  technical  journal  which  des- 
cribes the  test.  One  should  suspect  the  reliability 
of  a  test  whose  author  has  not  taken  the  trouble  to 
determine  its  reliability. 

Reliability  refers  to  the  correlation  between  the 
results  of  two  forms  of  the  same  test.   It  is  the 
accuracy  _of  a  test,  -  the  accuracy  with  which  the 
test  measures  v/hatever  it  does  measure.  This  is  not 
necessarily  what  the  test  claims  to  measure.  Unre- 


r 

r 


27 


liability  is  best  exhibited  by  giving  an  alternate 
form  of  a  test  and  noting  the  change  in  score.  A 
perfectly  reliable  test  is  one  for  which  the  score 
on  a  second  trial  remains  exactly  the  same  as  on  the 
first  trial  (barring  practice  effect).  A  perfectly 
unreliable  test  is  one  where  the  scores  on  two  suc- 
cessive repetitions  of  a  test  are  such  as  might 
occur  by  pure  chance  -  such  as  one  might  draw  out 
of  a  hat  containing  a  number  of  possible  scores.  In 
estimating  reliability  we  are  not  concerned  with  re- 
pealing the  identical  test,  but  in  giving  a  test  hav- 
ing the  same  construction,  form,  and  subject  matter 
as    he  original  test  but  having  different  items. 
In  using  tests  similar  in  nature  but  differing  in 
content,  one  is  really  comparing  the  two  halves  of 
a  test  double  the  size  of  either  form. 

Factors  Which  Influence  Reliability. 
1.  Objectivity.-  V/hen  the  judgment  of  the  scorer  enters 
into  the  determination  of  the  score,  the  test  is  called 
subjective.  A  subjective  test  must  be  unreliable,  for 
it  is  obvious  that  the  very  same  test  paper  may  be 
scored  differently  by  two  different  scorers,   or  by 
the  same  scorer  at  two  different  times.  Objectivity 
may  oe  determined  by  inspecting  the  form  of  the  test 
and  noting  whether  judgment  is  necessary  in  the  scoring. 
If  one~has  to  use  considerable  judgment  in  determining 


c 


the  correctness  of  an  item,  then  the  test  is  subjectiv 
On  the  other  hand,  if  one  correct  answer  only  is  admis 
sible  for  each  item,  and  that  one  is  to  be  clearly  in- 
dicated in  some  way,  I  hen  the  test  is  highly  objective 
Objective  tests  are  usually  recognition  tests,  as 
opposed  to  recall  tests. 

The  matter  of  objectivity  is  one  factor  of  unreli- 
ability which  can  most  easily  be  remedied.  Of  all  the 
factors  entering  into  unreliability,  lack  of  objec- 
tivity is  perhaps  the  most  inexcusable,  for  it  is 
usually  possible,   Dy  exercising  sufficient  engenuity, 
to  turn  a  subjective  test  into  an  objective  test, 
with  little  or  no  loss  in  validity  to  the  test. 
2.  Length  of  the  test.-  One  of  the  most  potent  factors 

in  determining  reliability  is  the  length  of  the  test. 
1 

'.Vood    says  that  there  is  a  "crucial  need  for  increas- 
ing the  validity  and  reliability  of  our  educational 
measurements.  Even  our  best  tests  afford  only  approx- 
imately accurate  placement  of  individual  students. 
The  needed  improvement  will  be  hastened  by  more  care 
in  the  construction  of  individual  questions,  by 
drastically  lengthening  our  examinations,  and  by 
using  a  greater  variety  of  appropriate  que si on- f orms 
in  them." 

Two  questions  give  more  reliable  information  of  a 

1.  Wood,  Ben  D.-  "Studies  of  Achievement  Tests" 

Journal  of  Educational  Ps2rchology. 
17:     125-  159.        Feb. 1926. 


29 


person's  knowledge  or  ability  than  one.  A  test  with 
fifty  items  gives  a  more  reliable  estimate  of  a 
person's  ability  than  a  test  of  twenty-five  items.  A 
li.ited  sample  of  anything  gives  an  untrustworthy, 
unreliable  picture  of  the  whole. 

5.  Evenness  of  sealing.  -  Not  only  should  the  items 
of  a  test  be  selected  because  of  the  fairness  and 
adequacy  of  the  sampling  of  the  topic  to  be  tested, 
but  also  because  of  their  range  of  difficulty.  If 
the  items  of  a  test  are  equally  spaced  in  difficulty, 
there  is  no  loss  in  reliability  due  to  coarseness  of 
the  measuring  scale.  Unless  one  know  definitely  the 
difficulty  of  the  items  of  a  test,  there  is  a  chance 
that  the  items  will  be  bunched  in  difficulty. 
4.  Condition  of  the  pupils  taking  the  test.  -  Other 
factors  of  test  unreliability  have  to  do  with  the 
conditions  under  which  the  test  is  taken,   and  the 
variations  in  the  pupils  taking  the  test.  With  regard 
to  the  latter,  there  is  much  apprehension.  The 
instability  of  the  individual  taking  the  test  is  com- 
monly regarded  as  the  outstanding  factor  in  causing 
test  unreliability.  Lost  people  feel  that  strain  or 
nervousness,  or  a  physical  condition  not  quite  the 
best,  works  havoc  with  their  performance  on  an  exam- 
ination. Not  only  does  popular  impression  attribute 
much  of  the  cause  of  unreliability  to  this  nervousness, 


i 


3D 


etc.  ,  "but  even  experts  in  the  field  of  testing  make 

similar  statements.  But  variety  of  opinion  exists 
1 

here.  Kelley  believes  such  phenomena  have  a  certain 

amount  of  influence  in  test  reliability. 
2 

LcCall  says,  "An  automobile  horn,  the  lonesome  howl 

of  Jack's  dog,  the  bleating  of  Mary's  lamb,  a  sudden 

thought  of  the  swimming  hole,  growing  discomfort  of 

a  strained  posture,  these  and  a  thousand  other  large 

and  small  internal  and  external  influences  register 

themselves  in  the  pupil's  scores." 
3 

Symonds  says, "In  a  study  of  reliability,   I  give 

evidence  to  show  that  the  general  conditions  of  an 

individual  exert  practically  no  influence,  under 

ordinary  school  conditions,  on  the  results  of  testing. 

On  the  contrary,  not  one  of  2^2  school  children 

showed  any  appreciable  unreliability  in  tests  due  to 

this  personal  factor." 
4 

Syi  .onds  says,  further,   "It  is  remarkable  how,  under 
the  stress  of  taking  a  test,  an  individual  can  pull 
himself  together  to  do  his  best.  This  seems  to  be  the 
general  rule  rather  than  the  opposite." 
5.  Familiarity  of  pupils  with  the  technique  of  taking 

tests.  -  Much  of  test  unreliability  is  due  to  lack 
of  training  in  the  technique  of  taking  a  test.  For  in- 
stance, there  may  be  a  great  variation  in  the  speed 

l".Kelley^T.L.-"LTote  oh  the  Reliability"  of  a  Test ;  A  Reply 
to  Dr .Criticism" . Journal  of  Educational 
Psychology,  15.   193-204,     April,  1924 

E.McCall,W.A.  How  to  measure  in  Education,  pp.308 

3.  Symonds  ,P.i.I.  "A  Study  of  Extreme  Cases  of  Unreliability" 
Journal  of  Educational  Psychology, 
15,  99-106,      Feb. 1924 

4 . Symonds 9 P.M. Measurement  in  Scondary  Education, p. 295 


it 


31 


with  which  the  tests  are  taken.   In  tasks  which  are 
unfamiliar  the  pupil  would  often  seem  to  be  struggling 
with  what  to  do  the  first  time,  but,  once  he  knew 
what  was  expected  he  would  go  much  faster  the  second 
time,  with  a  clear  understanding  of  the  task.  Some- 
times just  the  opposite  would  take  place.  On  the 
first  trial  the  pupil  would  commence  underlining 
rapidly,  with  no  clear  conception  of  the  problem. 
Later  he  would  catch  on  to  what  was  expected,  and  the 
new  job  of  actually  doing  the  problems  would  slow  him 

dOV.TL. 

What  Should  One  Expect  In  The  Way  Of  Reliability? 
One  answer  is:  Expect  the  highest  reliability  you 
can  get  in  the  time  allowed,  -  that  is,   set  the  time 
you  can  afford  for  testing,  and  then  use  that  type  of 
test  which  will  give  you  the  highest  reliability. 
Luch  depends  on  the  number  of  items  that  can  be  asked 
in  the  time  allowed,  and  also  the  validity  of  these 
items . 

The  answer  to  the  question:  What  reliability  should 
one  expect?  must  remain  a  qualified  one.   The  factors 
which  enter  into  unreliability  are  so  many  and  obscure 
that  tests  may  fail  to  show  high  reliability  for  un- 
known reasons. 

With  thirty  minutes  of  testing,  where  fifty  or  more 
objective  questions  are  asked,  we  should  expect  to  get 


32 

over  .80  for  a  reliability  coefficient.  With  increased, 
testing  time,  involving  a  naiger  number  of  questions, 
the  reliability  rises  to  any  reliability  desired.  One 
gets  in  test  reliability  what  one  is' willing  to  spend 
in  testing  time. 

Teachers  should  come  to  realize  that,  in  order  to 
make  testing  worthwhile,  the  test  results  should  be 
reliable.   This  implies  continuing  the  test  over  two 
or  more  school  periods.  One  should  aim  to  get  a  re- 
liability of  .90  in  all  cases  where  the  results  of 
the  testing  are  to  be  used  seriously  in  the  adminis- 
trative or  guidance  functions  of  the  school. 

Tho  paucity  of  determination  of  reliability  is 
pathetic.  The  tests  for  which  reliability  coefficients 
or  probable  errors  of  a  score  can  be  found  are  a  mere 
handful.  No  be  ter  evidence  of  the  youth  of  the  test- 
ing movement  in  the  high  schools  exists  than  this 
fact  that  workers  can  assemble  a  test  and  publish  it 
without  demonstration  that  the  test  is  better  than 
someone's  else.  The  time  will  surely  come  when 
teachers  will  be  trained  to  demand  tests  of  a  specified 
reliability. 

4).  Other  things  being  equal,  choose  a  test  that  has  norms. 

A  standardized  history  test  without  norms  is  prac- 
tically useless.  A  norm. is  usually  an  average,  a  medium, 
some  percentile,  a  measure  of  variability,  or  some  o'her 


33 


measure  derived,  from  one  of  these.   It  is  most  often  a 
mean,  or  a  medium.  The  term  "standard"  means  a  "goal". 
Norms,  or  standards,  are  essential  for  interpreting 
the  results  of  a  test.  The  three  most  common  types  of 
norms  supplied  with  high  school  tests  are: 

(a)  Percentiles  (either  grade  norms  or  age  norms) 

(b)  Subject  norms 

(c)  T-scores  (or  other  measures  based  upon  the 
standard  deviation  or  other  measures  of  vari- 
ability) . 

A  class  average  should  be  compared  with  the  norm 
or  average  performance  of  other  groups.  A  pupil's 
score  can  only  be  satisfactorily  interpreted  in  the 
light  of  percentile  norms,  or  some  similar  set  of 
indices,  v/hich  will  permit  comparisons  of  a  pupil's 
performances  with  a  large  number  of  pupils. 

Norms  should  accompany  a  standardized  history  test. 
Liost  publishers  include  the  norms  in  the  manual  of 
directions  accompanying  the  test,  or  on  the  tabulation 
entry  sheet  which  they  sometimes  provide. 

Select  tests  for  which  ther  are  satisfactory  norms 
derived  from  a  large  number  of  cases.  This  is  evidence 
that  the  test  has  been  widely  used. 
5.  Other  things  being  equal,  choose  a  test  having 

alternative  forms   The  best  standardized  history 


♦ 


m 


34 


tests  are  made  with  two  or  more  duplicate  forms, 

and  test  users  should  insist  on  using  tests  that  have 

alternative  forms  for  several  reasons. 

If  it  is  desirable  to  repeat  a  test  during  the 
course  of  the  year's  work,  there  will  he  no  danger,  if 
previous  test  items  have  been  looked  up  in  the  mean- 
time and  remembered. 

Another  reason  is  that  oftentimes  a  teacher  wishes 
to  .:easure  his  class  with  greater  reliability  than  a 
single  test  affords,  and  alternative  forms  provide  an 
easy  way  of  immediately  extending  the  length  of  a 
test.  The  author  of  a  test  needs  at  least  two  forms  to 
determine  the  reliability. 

hen  standardized  tests  were  first  marketed,  authors 
put  the  best  and  most  valid  material  in  one  test.  How 
that  enough  material  must  be  assembled  to  cover  the 
requirements  of  two  or  more  comparable  tests,  the  best 
material  is  sorted  out  equally  to  all  the  different 
forms.  For  this  reason,  when  there  are  several  alterna- 
tive forms,  no  one  form  can  contain  the  most  significant, 
or  most  valid  material.  One  has  to  dip  dovm  in  the  re- 
servoir for  less  important  material. 

There  must  always  be  a  mean  between  the  desire  to 
obtain  a  valid  test  and  the  desire  to  provide  for  in- 
creased reliability  by  supplying  several  alternative 
forms. 


f 


35 


6.  Choose  a  Test  because  it  is  scaled  A  scaled 

test  is  one  in  which  the  items  of  the  test  progress 
according  to  some  quality  such  as  difficulty,  aesthetic 
quality,  legibility,  etc. 

A  test  scaled  on  the  "basis  of  difficulty  is  one 
in  which  each  item  is  more  difficult  than  the  pre- 
ceding one.  Most  standardized  educational  tests  are 
scaled  on  a  basis  of  difficulty. 

In  order  to  make  a  test  a  true  scale  in  the  same 
sense  that  scale  is  used  in  other  branches  of  science , 
two  other  features  are  essential.  One,  the  items  of 
the  test  should  proceed  with  equal  increments  of  dif- 
ficulty. That  is,  item  3  of  the  test  should  be  as 
much    ore  difficult  than  item  2,  as  item  2  v;as  more 
difficult  than  item  1,  and  so  on.  Th%  other  qualifi- 
cation is  that  zero  on  the  scale  should  represent 
"just  not  any"  of  the  thing  which  is  being  measured  by 
the  test.  Difficulty,  in  the  sense  in  which  it  is  used 
above,  may  be  determined  in  two  ways:    (1)  by  the  combined 
judgment  of  a  number  of  competent  people,   (2)  by  the 

decreasing  percentage  of  a  group  talcing  the  test  who 

1 

can  answer  successive  items.  Thorndike  has  shown  that 
there  is  a  high  correlation  between  these  two  pro- 
cedures in  determining  difficulty.  The  second  method 
is  to  De  preferred  when  it  is  possible. 

1.  Thorndike,  S.L.  ,  Br e^. man,  E.A. ,  and  Cable,  Ll.V.- 
"The  Selections  of  Tasks  of  Equal  Di  ficulty  by 
a  Consensus  of  Opinion". 

Journal  of  Educational  Research,  9:133-139 

Feb. 1924. 


36 


There  has  been  much  apprehension  in  the  scaling  of 
tests.   In  general,  the  scaling  of  a  test  has  been  much 
overemphasized,  in  past  discussion.  To-day, "number  of 
correct  items"  is  nearly  universal  as  a  scoring  formula. 
In  fact,  providing  certain  general  cautions  are  ob- 
served, it  is  permissible  to  merely  arrange  the  items 
of  which  your  test  is  to  be  composed,  in  order  of 
difficulty,  without  paying  much  further  attention  to 
scaling. 

One  caution  to  be  observed  is  that  there  should  be 
some  items  on  the  test  so  easy  that  no  pupil  will  get 
a  zero  mark.  A  zero  score  is  called  an  undistributed 
score.  By  this  is  meant  that  the  scores  fail  to  show 
individual  differences.  Two  individuals  with  zero 
scores  may  really  differ  in  the  ability  in  question, 
yet  this  difference  will  not  appear  in  the  score. 
A  test  with  zero  scores  is  too  hard. 

Likewise,  there  should  be  some  elements  in  the  test 
of  such  a  difficulty  that  no  pupil  in  the  class  will 
get  a  perfect  score.  Perfect  scores  on  a  test  are 
also  undistributed  scores  and  equally  as  pernicious  as 
zero  scores.  Perfect  scores  fail  to  distinguish  bet- 
ween individuals  of  high  ability.  The  introduction  of 
more  hard  elements  into  a  test  which  yields  perfect 
scores  would  undoubtedly  produce  a  spreading  out  of  the 


4 


37 


perfect  scores. 

There  should  "be  no  undistributed  scores  at  any 
point  of  the  scale.  Such  undistributed  scores  some- 
times occur  in  the  middle  of  a  test  when  items  are 
bunched  in  difficulty. 

A  final  Qualification  is  that  the  test  should  be 
of  a  general  difficulty  such  that  the  mean  score  of 
a  class  is  equivalent  to  about  fifty  per  cent  of  the 
total  possible  score  on  the  test. 

V.ith  these  qualifications  it  makes  little  differ- 
ence whether  or  not  the  items  are  accurately  scaled 
for  general  school  usage. 

Most  standardized  tests  for  use  in  high  school  hay  e 
such  a  generous  time  limit  that  speed  is  not  a  factor 
in  the  score. 

Of  course  it  should  be  understood  that  a  test 
which  is  not  thoroughly  scaled  is  useless  for  experi- 
mental purposes  (unless  highly  refined  statistical 
procedures  are  used) . 

7).  Ease  of  Administration  Ease  of  administration 

should  be  considered  in  the  choosing  of  standardized 
history  tests,  especially  from  the  more  practical 
standpoint.  This  criteria  is     ooviously  of  great  im- 
portance.  It  should  be  judged  from  two  points  of  view: 
first,  and  most  important,  the  clarity  of  instruction 
to  the  pupil,  and  second,  the  clarity  of  the  instructions 


38 


to  the  examiner. 

Directions  to  'he  pupils  are  properly  printed  in 
the  test  booklets.  If  the  test  is  broken  into  parts  or 
sections,  each  section  should  be  preceded  by  the 
directions  for  that  unit,  together  with  samples  showing 
the  pupil  how  he  is  to  indicate  his  answers.  The 
instructions  to  pupils  should  be  full,  and  very  simple 
in  phraseology.   It  must  not  be  assumed  that  all  pupils 
will  holdin  mind  long  and  complicated  directions. 

There  is  a  very  simple  rule  which  is  observed  in 
£ood  textbook  construction,  and  other  instructional 
materials,  viz:  Place  on  the  materials  intended  for 
the  pupil  only  those  printed  directions  which  apply 
to  him,  reserving  all  instructions  to  the  teacher  or 
examiner  to  a  separate  place,  preferably  the  Manual 
of  Directions  or  Examiner 1  s  Guide. 

A  test  whose  administration  cannot  be  learned  by 
the  average  teacher  in  an  hour  or  two  is  not  likely 
to  succeed.  At  the  same  time,  the  beginning  examiner 
must  be  warned  about  a  multiplicity  of  small  but  im- 
portant details,   such  as  instructions  covering  the 
distribution  of  blanks,  the  filling  in  of  pupil  in- 
formation data,  the  observance  of  time  limits,  the 
breaking  of  pencils,  the  prevention  of  disturbing 
factors,  and  other  req.uireii.ents  of  jgcod  test  conditions. 

The  directions  for  giving  a  standardized  history 


39 


test  in  high  school  should  cover  the  following  points: 

1 . Statement  as  to  what  it  is  a  test  of. 

2 . Statement  as  to  how  many  parts  the  test  has, 
if  the  class  is  allowed  to  go  straight  through. 

3. Statement  as  to  how  many  questions  there  are  in 
a  given  part  of  the  test,  if  the  class  is 
stopped  at  the  end  of  each  part. 

4.  Warning  to  turn  the  page,  if  the  test  continues 
on  the  next  page. 

5.  V/arning  not  to  dwell  too  long  on  any  one  item 
or  question. 

6.  A  sample  exercise  which  the  examiner  usually 
reads  aloud. 

7. Statement  as  to  the  mechanical  form  of  answer- 
ing -  whether  underlining,  crossing  out,  writing 
a  number  in  a  parenthesis,  etc. 

8. Directions  as  to  what  to  do,   in  case  a  pupil 
finished  the  test  before  time  is  called. 

Scoring. 

A  test  should  not  be  primarily  selected  because 
it  is  easy  to  score,  but,  at  the  same  time,  a  good  test 
ought  to  be  as  easy  to  score  as  possible. 

Publishers  usually  distribute  scoring  keys  with 
standardized  tests.  The  key  should  be  printed -as  near 
to  the  margin  of  the  paper  as  possible  so  that  it  can 
be  brought  close  to  the  pupil's  answers.   It  should  be 
spaced  exactly  as  the  material  is  spaced. 
8 )  Other  things  being  e^ual,   choose  a  test  that  is 

inexpensive   This  criterion  does  not  represent 

actual  test  superiority,  but  is  merely  a  criterion  that 


r 


* 


40 


answers  administrative  exigency.  Probably  no  teacher 
or  school  administrator  needs  to  De  told  to  select  an 
inexpensive  test.  There  i     little  or  no  correlation 
between  the  price  of  a  test  and  its  value  as  a  test. 

To-day,  publishing  houses  issue  series  of  standar- 
dized tests,  selected  and  edited  by  expert  psychologists, 
print  them  in  an  attractive  way  on  good  Quality  paper, 
and  cnarge  prices  commensurate  with  their  quality. 

Previous  criteria  are  more  important  than  this  one 
of  cost,  and  one  should  always  choose  the  best  test  from 
the  point  of  view  of  validity  and  reliability,  even 
if  high  priced. 

V/hat  we  need  in  the  field  of  testing  is  lists  of 
valid  elements  if  known  knowledge  in  each  subject, 
from  which  a  teacher  could  readily  construct  her  own 
test,  having  the  optimum  validity  for  her  class,  and 
the  desired  reliaDility.  With  such  lists  available, 
a  city  system  could  construct  and  print  its  own  tests, 
thereby  freeing  itself  from  dependence  on  publishing 
houses,  and  reducing  costs  to  a  minimum. 

Test  making  is  still  in  its  infancy,  and  when  workers 
are  through  scratching  the  surface  and  producing  merely 
usable  standardized  tests,  they  will  proceed  to  dig  the 
ground  deeper  and  produce  exhaustive  lists  of  elements, 
together  with  their  difficulty  and  validity  at  various 
stages  of  learning. 


41 


9)  Summary  The  chief  criteria  which  should  be  kept 

in  mind,  in  selecting  history  tests  are  validity,  re- 
liabilty,  objectivity,  norms,  duplicate  forms,  scaling, 
ease  of  administration,  and  cost. 

The  two  fundamental  criteria  for  selecting  history 
tests  are  validity  and  reliability,  -  all  others  are 
subordinate.  Objectivity  in  scoring  is  one  important 
factor  in  reliability.  Validity  refers  to  whether  or 
not  a  test  accomplishes  its  purpose,  and  may  be  sub- 
divided into  curricular,  and  statistical  validity. 
The  former  refers  to  agreement  with  the  content  of  a 
desirable  curriculum,  and  the  latter  to  the  statis- 
tical testing  of  scores  to  determine  their  validity. 
Reliability  is  syninymous  with  accuracy  and  is  deter- 
mined by  giving  a  test  twice  to  the  same  pupils. 
k  ong  the  chief  factors  that  affect  reliability  are 
objectivity,  the  length  of  a  test,  evenness  of  scal- 
ing of  the  test  elements,  and  the  directions  for  giv- 
ing and  scoring.  Objectivity  refers  to  the  quality  of 
a  test  that  there  is  no  Stoubt  as  to  what  the  correct 
answers  are. 

There  is  a  crucial  need  for  increasing  the  validity 
and  the  reliability  of  our  educational  measurements. 

A  scaled  test  is  to  be  preferred,  although  for 
actual  school  practice  merely  placing  the  items  in 


i 


42 


an  order  of  difficulty  suffices  in  a  comprehensive 
objective  test.  For  experimental  work,  however,"  a 
test  should  be  accurately  scaled.   In  selecting  a 
test  one  must  be  careful  to  see  that  it  goes  to  low 
and  to  high  enough  limits. 

To-day,  the  best  standardized  tests  have  aceom- 
panying  them  standardized  directions  and  norms,  and 
are  provided  in  two  or  more  alternative  forms. 

Though  not  of  great  theoretical  importance,  the 
case  of  administering  a  test  and  its  cost  are  of 
practical  importance. 


1 


43 


Chapter  4. 

Teaching  History  through  the  Use  of  Oral  questioning . 

1)  The  uses  of  oral  questioning. 

2)  Oral  versus  v/ritten  tests. 

3)  Conclusion. 


44 


Chapter  4. 

1 )  The  Uses  of  oral  Quest ioning . 

Strictly  speaking,  oral  questioning  does  not  usually 
constitute  an  examination.  Oral  examinations  are  some- 
times employed,  but  their  value  for  the  more  serious 
and  final  determinationof  achievement  is  doubted. 
This  is     o  argument  against  oral  questioning.   In  many 
ways,  the  teacher1 s  daily  questioning  of  her  pupils  is 
of  far  more  fundamental  importance  than  her  final 
written  examination.  The  point  is  that  oral  questioning 
is  more  logically  a  part  of  initial  instruction  than 
of  final  measurement,  assuming  that  there  are  at 
least  five  roughly  distinguishable  phases  to  the  com- 
plete act  of  instruction  as  follows: 

1.  Initial  presentation  of  materials  to  be  mastered. 
This  phase  consists  of  setting  problems  to  be  solved, 
textbook  readings  and  discussions,  teacher's  comments 
on  persistent  difficulties  in  learning,  etc. 

2.  Drill  to  support  and  fix  the  temporary  mastery 
gained  under  the  first  phase  of  instruction.   This  may 
be  drill  proper,  or  it  may  mean  applications  and  re- 
views. 

3.  Diagnostic  measurement  at  the  period  when  p'hases 
one  and  two  are  thought  to  be  complete. 

4.  Re-teaching  or  remedial  instruction  upon  any  weak- 


45 


nesses  revealed  under  the  third  phase. 
5.  Final  measurement  and  evaluation  of  a  more  general 
and  less  detailed  character  than  that  of  phase  three. 
This  constitutes  the  final  survey  of  achievement , and 
leads  to  a  judgment  as  to  whether  the  individual  or 
class  is  ready  to  proceed  to  new  work. 

It  should  be  noted  that  certain  of  these  phases 
are  less  prominent  than  others  at  times,  the  relative 
emphasis  varying  with  the  character  of  pupils , teacher , 
textbook,  subject,  motivation,  etc. 

Oral  questioning  plays  its  greatest  role  in  the 
first,  second,  and  fourth  phases  of  instruction  as 
presented  above.  It  is  primarily  instructional;  its 
value  for  measurement  is  more  subordinate.  Oral  ques- 
tioning as  an  art  has  a  long  history  and  a  consider- 
able literature.   It  is  worthy  of  more  experiemntal 
study  than  it  has  received  to  date. 

2 )  Oral  versus  written  tests. 

Some  reasons  wh    the  oral  test  is  used  more  fre- 
quently than  the  written  test  in  class  work  are  as 
follows: 

1.   In  the  oral  test  the  teacher  can  check  up  the  errors 
of  individuals  on  the  spot,  while  in  the  written  test 
correcting  of  papers  outside  of  the  class  period  is 
involved. 


46 


2.  In  the  oral  test  the  whole  assignment  can  "be 
covered,  while  in  the  written  test  this  is  difficult, 
and  teachers  feel  that  they  must  go  over  the  entire 
lesson  in  the  class  period. 

3.  Oral  questioning  is  also  important  in  securing  the 
proper  sort  of  attention.  Every  member  of  the  class 
should  be  accustomed  to  pay  such  attention  at  all 
points  in  the  recitation,  that  he  can  take  up  the 
work  where  the  one  who  is  reciting  has  left  off. 
Class  control  is  thereby  enhanced. 

4.  Oral  expression  is  vastly  more  important  than 
written  expression.  Training  in  speaking  is  invaluable. 

5.  Requires  quick  thought. 

6.  Lakes  the  student  more  logical  in  speaking. 

7.  Prevents  misunderstanding  of  question  by  student, 
or  of  answers  by  teachers. 

Some  advantages  of  the  written  test  in  class  work 
are  as  follows: 

1.   The  written  test  is  generally  moreeconomical  than 
the  oral  test.   The  reason  for  this  is  that,  in  the 
written  test,  all  the  pupils  are  mentally  active  during 
the  entire  period  of  the  test,  while  in  the  oral  test 
only  one  pupil  is  necessarily  engaged  at  a  given  time. 
Further  than  this,  by  use  of  the  written  test,  the 
teacher  can  test  the  knowledge  and  the  skill  of  all 


47 


of  the  members  of  his  class  much  more  extensively 
than  he  can  by  the  oral  test. 

2.  The  written  test  gives  time  to  think  and  organize 
and  present  more  logically. 

3.  The  student  is  more  at  ease. 

4.  There  is  less  chance  for  bluffing. 

5.  It  gives  all  an  equal  chance  on  the  same  question. 

6.  It  gives  a  chance  at  more  than  one  question. 

7.  It  is  fairer. 

8.  It  is  more  thorough. 

9.  It  calls  for  more  preparation. 
3 )  Conclusion  

Strictly  speaking,  oral  questioning  does  not  usually 
constitute  an  examination.   Oral  Questioning  is  more 
logically  a  part  of  initial  instruction  than  of  final 
measurement . 

As  far  as  practical,  tests  for  knowledge  should  be 
written  rather  than  oral.  The  written  test  is  generally 
more  economical  than  the  oral  test.  Student  and  teacher 
favor  short  written  tests  instead  of  oral  tests. 

Probably,  frequent  short  written  tests,  supple- 
mented by  occasional  long  written  examinations,  con- 
stitute the  best  form  of  routine  testing. 


1 


0 

$ 


Chapter  5 


Teaching  History  through  the  Use  of  the  Traditional 
Examination. 

1)  Nature  of  the  traditional  examination. 

2)  Advantages  of  the  traditional  examination. 

3)  Dis-advantages  of  the  traditional  examination. 

4)  Conclusion. 


49 


Chapter  5 

1)  Mature  of  the  traditional  examination  

The  rise  of  objective  or  new-type  examination 
makes  necessary  a  distinction  uetween  the  long-estab- 
lished form  of  test  and  the  more  recent  and  more 
objective  type  of  examination.  The  former  has  come 
to  be  known  as  the  "tradit ionfcl" ,  "essay7',  or  "dis- 
cussion" examination.  This  sort  of  examination  has 
been  almost  exclusively  employed  in  the  high  school 
until  the  last  few  years,  and  still,  perhaps,  more 
used  than  any  other. 

The  traditional  examination  needs  no  definition.  It 
is  the  examination  which  we  all  recognize  as  consis- 
ting of  five,  ten,  or  more  questions,   oeginning  most 
often  with  "state  in  full",  "dexcribe",  "tell  what 
you  know",  "analyze",  "discuss",  "outline",  "explain", 
"summarize",  etc.  The  pupil  is  free  to  write  what  he 
chooses  as  a  response  to  the  stimulus  question.  It 
is  to  be  contrasted  in  its  mechanics  with  the  newer 
objective  examination  in  that  the  latter  calls  for 
underlining,  crossing  out,  checking,  etc.,  instead 
of  discussion. 

The  traditional  examination  cannot  be  scored  mechani- 
cally by  keys  or  stencils,  but  must  be  evaluated  or 
scored  subjectively. 

Within  the  last  few  years  a  considerable  amount  of 


50 


space  and.  time  has  Deen  devoted  to  condemning 
traditional  examinations,  and  to  showing  or  attempt- 
ing to  show  that  those  of  the  new-type  are  superior, 
.-any  valid  points  have  b-en  made,  but  frequently, 
perhaps  usually,  the  merits  and  advantages  of  tradi- 
tional examinations  have  been  overlooked. 

All  too  often  it  has  either  been  explicitly  stated, 
or  else  implied  by  the  trend  of  the  discussion,  that 
the  abolition  of  traditional  examinations  is  desir- 
able. It  is  very  unfortunate  that  such  an  attitude 
should  have  been  taken  and  expressed.  It  was  only 
natural  that  the  protagonists  of  new-type  tests 
should,  in  their  enthusiasm,  over-estimate  and 
over-state  their  value,  but  in  many  instances  this 
has  been  done  in  such  extreme  fashion  that  little 
excuse  therefor  is  apparent. 

A  thoughtful  consideration  of  the  question  will, 
undouotedly,  lead  to  the  conclusion  that  each  of  the 
two  types  above  has  its  peculiar  points  of  strength 
and  its  distinct  advantages  in  actual  use.  There- 
fore, it  is  believed  most  emphatically  that  a  com- 
plete testing  program  of  any  teacher  should  include 
some  use  of  both  kinds.  In  other  words,  it  is  not  a 
question  of  deciding  whether  the  essay,  or  the  new- 
type  examination  is  the  better,  and  then  of  making 
exclusive  use  thereof,  but  rather  of  determining  the 


4 


51 


occasion  and  circumstances  under  which  each  is  most 
valuable,  and  then  employing  each  in  accordance 
therewith. 

It  is  the  purpose  of  this  chapter  to  outline  clearly, 
and  also  as  completely  as  possible,  both  the  advantages 
and  the  dis-advantages  pf  the  traditional  examination, 
and  to  arrive  at  some  sort  of  a  conclusion  at  the 
end  of  the  chapter. 

2)  Advantages  of  the  traditional  examination  

The  traditional  or  essay  examination  is  easier  to 
make,  especially  as  regards  the  amount  of  time  re- 
quired. The  comparitively  few  questions  of  this  type 
needed  for  an  examination  can  De  made  in  much  less 
time  than  the  relatively  large  number  of  new-type 
items.  moreover,  the  fact  that  essay  examinations 
can  almost  always  be  written  on  the  blackboard,  or 
even  given  orally,  with  satisfactory  results,  makes 
it  but  little  trouble  to  employ. 

At  present,  most  teachers  are  more  familiar  with 
traditional  or  essay  than  new-type  tests  and,  there- 
fore, are  better  qualified  to  make  and  give  them. 
This  condition  is  rapidly  being  changed,  however, and 
will  not  long  be  a  valid  argument  for  tneir  use. 

It  is  true  that  it  does  not  require  a  great  deal  of 
study  on  the  part  of  a  teacher  to  acquire  a  fair 
knowledge  and  understanding  of  the  new-type  examina- 


52 


tion,  but  many  teachers  have  "been  unwilling  to  put 
forth  the  requisite  amount  of  effort,  even  when  their 
attention  has  been  directed  along  this  line.  Until 
the  ::iany  teachers  of  this  sort  are  better  trained 
and  informed  about  new- type  tests,  it  is  proDably 
unwise  to  attempt  to  compel,  or  even  induce,  them 
to  make  a  large  use  of  such  tests. 

It  is  probatly  true  in  practice  that  essay-type 
tests  encourage  less  guessing.  It  often  allows  a 
pupil  considerable  degree  of  freedom  in  choosing  the 
form  of  his  answer,  and  not  only  allows,  but  even 
requires,  him  to  select  from  a  fairly  large  stock  of 
information  the  portion  which  he  wishes  to  use. 
Moreover,  the  situation  is  frequently  such  that  many 
items  which  might  be  selected  are  neither  absolutely 
right  nor  wrong.  Thus,  judgment  and  other  qualities 
are  called  into  play  much  more  than  on  most  new-type 
tests . 

Thus,  essay  examinations  reveal  certain  facts  con- 
cerning individual  differences  in  the  quality  of 
mental  activity  which  are  not  shown  by  those  of  the 
other  variety. 

In  fact,  it  has  been  claimed  that  the  one  most 
significant  merit  of  traditional  or  essay  examinations 
is  that  they  appear  to  test  certain  desired  outcomes 
and  .~ental  processes,  except  memory,  better  than  do 


53 


the  other  kind.  Although  new-type  exercises  may- 
measure  originality,  initiative,  power  to  organize, 
to  interpret,  to  analyze,  and  synthesize,  and  various 
other  reasoning  processes,  to  some  extent,  they 
appear  to  do  so  much  less  thoroughly  and  satisfac- 
torily than  wll-made  discussion  questions.  In  other 

words,  the  latter  possess  higher  validity  for  these 

1 

purposes  than  do  the  former.  Monroe  say,  "traditional 
examinations  call  for  the  functioning  of  distinct 
types  of  mental  ability  not  demanded  to  so  great  a 
degreein  any  other  kind  of  school  work,  and  should 
not,  especially  in  the  ease  of  final  examinations, 
"be  abolished,  nor  replaced  entirely  by  any  other 
form  of  sbhool  exercise.  Teachers  and  pupils,  more 
and  more,  should  be  impressed  with  their  unique  edu- 
cational value." 

They  also  provide  opportunities,  such  as  cannot  be 
afforded  by  tests  to  which  the  answers  are  single 
words  or  phrases,  for  measuring  pupils'  power  to 
express  their  thoughts,  to  write  well,  to  use  correct 
English,  and  other  related  abilities  and  habits. 

The  essay  examination,  if  properly  administered, 
not  only  measures  ability  to  organize  and  express 
ideas,  but  also  gives  training  in  such  ability.  Even 
though  it  ir-:  not  admitted  that  it  is  an  important 

l....onroe,W.S.-  "Written  Examinations  and  Their  Im- 
provement". The  Historical  Outlook, 
14:  306-318.  Nov.l9£3 


* 


54 


function  of  an  examination  to  give  such  training, 
little  if  any  objection  can  be  raised  to  the  inciden- 
tal benefits  of  this  sort  which  it  can  be  made  to 
yield. 

It  is  also  claimed  that  the  discus si on- type  of 
examination  can  be  more  easily  and  directly  adapted 
to  various  kinds  of  subject-matter.  This  advantage 
should  not  be  over-emphasized  since  the  large  number 
of  varieties  of  objective  tests  renders  possible 
their  adaptation  of  many  kinds  of  subject-matter, 
but  still  appears,  on  the  whole,  to  be  present. 

The  claim  has  been  made  that  the  ordinary  dis- 
cussion-type of  examination  encourages  less  dishonesty 
on  the  part  of  pupils  because  it  is  much  harder  to 
cheat  upon  them.  There  Is  some  justification  for 
this  claim,  because  the  answers  in  the  new-type  ex- 
amination aire  sho  short  that  it  is  relatively  easy 
to  see  the  answer  of  a  neighbor  by  a  hasty  look.  Hot 
infrequently,  however,  in  the  case  of  an  essay  ex- 
amination sufficient  information  can  be  gained  by  a 
mere  glance  to  enable  a  pupil  to  profit  thereby. 
Nevertheless,  on  the  whole,  it  cannot  be  denied  that 
there  is  some  validity  to  this  argument, 
3)  Disadvantages  of  the  traditional  examination  

The  traditional  variety  of  tests  are  less  re- 
liable than  those  of  the  new  type.  A  considerable  mass 


* 


55 


of  data,  concerning  which  more  will  "be  said  a  little 

later,  has  "been  accumulated  in  support  of  this  start  e- 
1 

ment . 

There  appear  to  he  two  chief  causes  for  the  differ- 
ence in  reliability.  One  of  these  is  that  the  typical 
essay  exan:ination  contains  comparatively  few  separate 
questions  on  exercises  and  that  hese  are  too  few  in 
number  to  constitute  a  satisfactory  or  reliable  sam- 
pling of  pupils  knowledge  or  achievement.  Too  few 
topics  are  covered,  and  there  is  too  much  chance 
that  these  few  will  be  among  those  which  some  pupils 
just  happen  to  know,  and  others  not  to  know.  This 
defect  could  oe  remedied  by  including  a  much  larger 
amount  of  questions,  but  so  doing  renders  a  tradition- 
al examination  of  too  great  length. 

The  second  of  the  two  chief  causes  ifior  the  lower 
reliability  of  traditional  examinations  is  that  they 
are  subjective  in  their  scoring.  The  answers  to  most 
questions  of  the  traditional  type  cannot  be  scored 
as  definitely  right  or  v/rong,  but  may  be  partially 
right  to  almost  any  degree.  As  a  result,  great  dif- 
ferences of  opinion  exist  among  supposedly  competent 

teachers  as  to  how  much  credit  shall  be  allowed  for 

2 

the  same  answers.  Odell  cites  evidence  showing  the 
unreliability  of  teachers'  marks  has  consisted  of,  or 
been  based  upon,  the  marks  given  traditional  examina- 

1.  Odell,  C.W.,     Traditional  Examinations  and  New- 

Type  Tests.  p.  183 

2.  Odell,  C.W.,    Traditional  Examinations  and  New- 

Type  Tests.  pp.  5-7 


56 


tion  papers,  and  therefore  tends  to  prove  the  point 

1 

just  made.  Starcn  and  Elliott  show  that,  among 
the  chief  results  of  a  series  of  investigations  re- 
lating to  the  reliability  of  grading  work  in  an 
essay-type  history  test  that  "the  marks  assigned 
to  the  same  paper  by  different  teachers  vary  enor- 
mously, in  fact  much  more  widely  than  the  average 

2 

teacher  would  anticipate."    Douglass  tells  of  an 
experiment  in  which  twenty-eight  American  History 
examination  papers  were  marked  "by  four  high  school 
teachers  of  history,  and  then  re-marked  by  the  same 
teachers  several  months  later.  The  average  differ- 
ences in  marks  given  the  same  paper  by  the  same 
teacher  were  greater  than  five  per  cent  in  the  case 
of  all  four  of  those  doing  the  marking.  In  one  in- 
stance there  was  a  difference  of  twenty-five  per 
cent  between  the  two  marks  given  the  same  paper  by 
the  same  teacher.  The  examination  used  was,  of  course, 
of  the  traditional  type. 

The  unreliability  in  scoring  responses  to  essay 
examinations,  due  to  teachers'  subjectivity,  is  only 
inpalrt  caused  by  disagreements  among  yeachers  as  :o 
just  what  the  correct  answers  are.  Much  of  it  also 
results  from  the  fact  that  teachers  do  not  agree  as 
to  the  relative  importance  and  therefore  the  weighting 

1. Starch, D. , Elliott ,  E.C . , -"Reliability  of  Grading 

Work  in  History".  School  Review 
21:   676-6S1.     Dec. 1913 
2. Douglass,  H.R.,      Modern  Methods  in  High  School 

Teaching.  p. 368 


« 


57 


of  the  different  parts  of  the  examination.  Some 
teachers  attempt  to  weight  according  to  supposed 
difficulty,  others  according  to  the  importance  of 
the  facts,  or  of  the  mental  activities  called  for, 
and  others  on  still  different  "bases.  Not  only  do 
their  judgments  differ  as  to  how  difficult  the 
various  exercises  or  questions  are,  hut  also  as  to 
how  great  weight  should  be  assigned  particular  ques- 
tions, upon  the  relative  difficulty  of  which  they  are 
agreed.  Host  teachers  who  determine  weights  on  this 
basis  count  more  upon  the  more  difficult  questions, 
on  the  ground  that  hreater  ability  is  required  to 
answer  them  correctly.  Some,  on  the  other  hand, count 
more  heavily  on  the  easier  questions,  "because  they 
believe  that  it  is  a  greater  dis-eredit  to  pupils 
to  be  unable  to  answer  these,  and  that,  therefore, 
they  should  be  penalized  ..ore  heavily,  if  they  fail 
to  do  so. 

Still  further,  teachers  are  influenced  in  the  marks 
which  they  give  pupils  written  work  oy  the  pupils1 
past  records,  by  the  general  opinion  which  they  have 
of  the  quality  of  their  work,  by  handwriting,  neat- 
ness, language,  usage,  style,  and  so  forth.  In  many 
cases,  teachers  are  unconscious  that  they  are  so  in- 
fluenced,  but,  nevertheless,  the  condition  is  very 
real,  and  almost  impossible  to  avoid.  Moreover,  the 


58 


merit  of  a  paper  as  a  whole  is  likely  to  influence 
the  marks  given  separate  questions.  If  the  answers 
to  the  first  few  questions  have  been  very  good,  the 
marker  is  liable  to  rate  any  later  poor  answers  too 
high.  Similarly,  if  the  first  few  answers  have  pos- 
sessed little  merit,  later  good  answers  are  liable 
to  be  discounted.  It  has  been  related  that  two  stu- 
dents in  an  English  training  college,  named  respec- 
tively Smith  and  Jones,  were  close  friends.  They 
were  both  members  of  the  same  English  class,  and  as 
such  hail  to  hand  in  essays  every  fortnight.  They 
consulted  one  another  in  their  work,  exchanged 
ideas,  and  produced  essays  apparently  very  similar. 
The  first  essays  were  rated  with  a  mark  of  "Very 
Good"  upon  Smith's,  and  of  "Very  Fair"  upon  JonesT, 
and  second,  third  and  succeeding  essays  received  the 
same  marks,  except  that  S*mith  occasionally  had  his 
raised  to  "Very  Good  Indeed",  and  Jones  had  his 
reduced  to  only  "Fair".     On  one  occasion  they  ex- 
changed and  copied  each  other Ts  essays,  that  is, 
Jones  handed  in  the  es  ay  really  written  by  Smith, 
and  Smith  that  written  by  Jones.  Nevertheless,  each 
on  the  essay  not  his  own  received  the  same  mark  he 
had  been  receiving  on  his  own  work.  Evidently,  the 
instructor  had  the  firmly  established  idea  that 
Smith  could  write  essays  of  considerable  merit  whereas 


59 


Jones  could  not. 

1 

A  most  striking  illustration  to  show  the  highde- 
gree  of  unreliability  of  marks  given  traditional  ex- 
amination papers  is  as  follows:  About  half  a  dozen 
expert  readers  were  marking  a  set  of  history  papers. 
One  of  these  readers  for  his  own  convenience  prepared 
what  he  considered  a  model  paper,  that  is,  a  paper 
containing  supposedly  correct  answers  to  all  the 
questions.  By  accident  this  model  paper  was  included 
with  the  students'  papers  and  passed  on  to  several 
other  readers.  They  rated  it  on  the  supposition  that 
it  was  a  stuednt's  paper,  and  assigned  marks  to  it 
varying  from  forty  to  ninety  per  cent. 

Even  when  teachers  are  aware  of  the  large  element 
of  variaoility  commonly  present  in  their  marks  and 
endeavor  to  reduce  it  by  careful  marking,  they  are 

unable  to  do  so  to  a  satisfactory  extent.  Some  re- 

2 

ductions  undoubtedly  can  be  made,  but  Ruoh,  as  well 

as  others,  offers  experiemntal  evidence  which  shows 

that  even  teachers  who  are  aware  of  the  larger 

scources  of  unreliability  and  endeavor  to  avoid  them, 

still  disagree  markedly,  in  a  few  cases  even  over 

fifty  per  cent,  as  to  the  marks  to  be  assigned  papers. 

A  disadvantage  of  the  traditional  examination 

connected  with  unreliability,  but  yet  worth  mentioning 

Wood,  Ben  E>.  ,  "The  measurement  of  College  work'' 
Educational  Administration  and  Supervision, 

7:   301-334  Sept. 1921 

2.  Ruch,  G.M. ,      The  Improvement  of  the  Written 

Examination.        pp.  55t62. 


60 


separately,  is  that  pupils  often  realize  that  the 
:arks  they  receive  are  to  some  extent  due  to  chance, 
or  other  causes,  which  should  not  be  operative.  Hot 
infrequently  when  pupils  compare  papers  after  reQ 
ceiving  them  back  marked,  they  find  that  answers 
containing  almost  exactly  the  same  material  have 
received  different  numbers  of  points,  or  even  that 
an  answer  containing  more  tff  the  facts  called  for 
than  does  another  has  been  marked  the  same,  or  some- 
times lower.  Pupils  feel,  therefore,  that  many  of 
the  ratings  they  receiv    are  really  unreliable  and 
unjust.  On  the  other  hand,  the  comparitively  high 
objectivity  of  new-type  tests  produces  a  distinctly 
favorable  re-action. 

An  essay  examination  may  reveal  a  general  lack 
of  knowledge  on  a  certain  topic,  but  it  rarely  points 
out  the  exact  points  which  need  attention.  New-type 
examinations  point  out  vert  definitely  the  particu- 
lar things  which  are  not  known,  and  thus  pave  the 
way  for  very  definite  and  purposeful  remedial  in- 
struction. 

To  some  extent  traditional  examinations  dis- 
courage systematic  and  worthwhile  review.  Since  they 
generally  touch  only  upon  a  few  topics  out  of  the 
large  number  included  in  a  course,  pupils  are  likely 
to  take  a  chance  that  the  few  which  such  tests  cover 


61 


will  happen  to  oe  among  those  that  they  think  ~hey 
know  fairly  well.  Sometimes  pupils  do  not  do  this, 
hut  try  to  guess  v/hat  topics  will  he  dealt  with  on 
the  examination  ,  and  then  review  intensively  on 
those,  neglecting  all  others.  Others  make  little 
or  no  attempt  to  study  because  they  think  it  too  much 
a  matter  of  chance  whether  not  doing  so  will  aid 
them  materially  in  responding  to  the  examination 
questions. 

Another  disadvantage  of  the  traditional  exam- 
ination is  that  it  does  not  test  the  achievements  of 
pupils  whose  powers  of  expression  are  poor.  That  is 
to  say,  because  of  difficulty  in  organizing  their 
thoughts  and  expressing  them  in  clear  language,  pupils 
may  know  more  about  the  question  or  topic  to  oe  dis- 
cussed than  they  indicate  in  their  responses.  Thus 
their  answers  depend  to  some  extent  upon  their 
ability  and  knowledge  along  other  lines  than  the 
subject  being  tested.  It  is  certainly  desirable  to 
test  ability  in  expression,  but  it  should  not  be 
done  in  such  a  way  that  a  pupil's  marks  in  history 
or  algebra,  for  example,  is  a  compound  mark  express- 
ing a  mixture  of  his  achievement  therein,  and  also 
in  language,  with  the  proportions  of  the  two  which 
enter  into  it,  unknown. 

...ore  or  less  similar  to  the  disadvantage  just 


62 


mentioned  is  that  traditional  examinations  frequently 
test  speed  of  writing  to  an  undesirable  extent.  Some 
pupils'  rates  of  writing,  and  freedom  from  fatigue 
while  writing,  may  he  enough  greater  than  those  of 
other  pupils,  whose  actual  ability  and  achievement 
in  the  subject  being  tested  is  the  same,  to  make  very 
material  differences  in  the  marks  which  they  receive 
upon  their  examination  papers.  This  can  easily  be 
avoided  by  allowing  sufficient  time  for  all  pupils 
to  finish,  but,  as  will  be  shown  later,  doing  so 
often  leads  to  certain  other  undesirable  results. 

Another  limitation  likewise  closely  connected 
with  those  just  stated  is  that  too  much  of  the  time 
spent  in  answering  essay  examination  questions  is 
ordinarily  devoted  to  v/hat  may  be  called  the  mere 
mecnanics  of  answering.  That  is  to  say,  the  act  of 
writing,  and  the  determination  of  the  form  of  answers 
consume  much  of  the  pupil's  time  and  attention  which 
should  be  devoted  to  real  thinking  about  the  questions 
asked. 

Because  of  the  fact  that  language  and  handwriting 
abilities  play  such  a  large  part  in  pupils'  answers 
to  discussion  examinations,  and  further,  because  if 
sufficient  time  is  given  to  these  matters,  there  is 
frequently  not  enough  time  left  to  devote  the  proper 
amount  of  attention  to  the  subject-matter  itself, 


63 


pupils  tend  to  develop  hurried  and  careless  habits  of 
expression  and  writing.  They  hasten  to  put  down  on 
their  papers  whatever  pertinent  facts  they  know, 
and  pay  little  attention  to  the  form  in  which  they 
are  expressed.  This  effect  is  rendered  still  worse 
"by  the  fact  that  many  teachers,  especially  high 
school  teachers  of  other  subjects  than  English, 
pay  little  attention  to  such  matters  as  spelling, 
composition,  punctuation,  sentence  structure, 
quality  of  handwriting,  and  so  forth.  Even  if  they 
do  correct  mistakes  along  these  lines,  the  attention 
of  the  pupils  is  not  usually  called  to  these  correc- 
tions in  such  a  way  as  to  make  them  very  effective. 
Because  of  these  facts  it  may  even  be  said  that  many 
essay  examinations  give  positive  training  in  un- 
grammatical  and  un- rhetorical  expression,  poor  hand- 
writing, and  other  undesirable  habits. 

It  is  practically  impossible  to  gibe  essay  ex- 
aminations so  that  they  test  the  rate  of  a  pupil Ts 
response  or  thinking  in  the  subject  dealt  with. They 
frequently  serve  to  test  rate  of  writing,  but  not 
rate  of  mental  activity.  It  is  not  necessary,  or 
even  highly  desirable,  that  all  examinations  should 
measure  rate,   but  it  would  be  very  unfortunate  if 
none  did  so.  In  practically  every  activity  outside 
of  school  life,  the  person  who  can  perform  a  task 


64 


as  well  as  another,  and  do  so  in  less  ti;.e,  is  irated 
as  more  efficient,  and  the  same  should  be  true  in 
much  of  the  rating  of  school  pupils  and  their  work. 
In  actual  practice,  however,  essay  examinations  as 
administered  and  marked  have  frequently  tended  to 
produce  the  impression  that  correct  answers  are 
equally  valuable  whether  given  in  a  short  or  in  a 
long  time. 

One  point  in  which  traditional  examinations 
possess  considerable  dis-advantage  over  new-type 
ones  i#  in  the  difficulty  of  scoring.  Careful  and 
accurate  scoring  of  the  answers  to  traditional 
questions  is  relatively  difficult  and  requires 
considerable  time.  In  the  case  of  many  already  over- 
worked teachers  the  result  is  that  this  added 
burden,  in  addition  to  their  other  duties,  is  suffi- 
cient to  result  in  a  lowering  of  their  general  phy- 
sical and  mental  vitality,  and  therefore  of  their 
teaching  efficiency.  Some  teachers  avoid  this  result 
by  reducing  the  numoer  of  examinations  below  a  de- 
sirable minimum,  and  others  by  scoring  pupils'  res- 
ponses so  hurriedly  and  carelessly  as  to  lose  many 
of  the  possible  benefits  to  be  derived  from  giving 
examinations.  It  is  also  frequently  possible  for 
pupils  to  deceive  themselves  as  to  the  correctness 


f 


r 


65 


and  quality  of  their  answers  in  discussion  examina- 
tions even  though  the  papers  have  been  well  criti- 
cized and  marked  by  the  teacher. 

It  is  usually  difficult,  if  not  absolutely 
impossible,  to  secure  satisfactory  norms  from  essay 
examinations.  The  difficulty  of  doing  so  places  very 
decided  limitations  upon  the  possibilities  of  com- 
paring achievement  in  different  classes  or  groups 
of  pupils,  and  thereby  renders  it  harder  for  teachers 
and  others  to  learn  whether  the  achievements  of 
their  pupils  are  equal  to  what  they  should  be  or  not. 

Because  it  is  comparatively  easy  to  dash  off 
a  few  discussion  questions  in  almost  as  short  a 
time  as  is  required  to  write  them,  many  teachers 
fall  into  the  habit  of  o.oing  so,  and  of  giving  little 
or  no  thought  to  the  selection  and  formulation  of 
the  questions  and  exercises  employed.  As  a  result, 
important  topics  and  portions  of  the  subject-matter 
studied  are  often  entirely  or  almost  entirely  neg- 
lected, whereas  others  are  . ealt  with  much  more 
frequently  than  there  is  any  need  for.  Hastily-made 
questions  are  liable  to  be  poorly  worded,  obscure, 
and  indefinite,  with  the  result  that  teachers  in 
scoring  either  oenalize  pupils  who  cannot  understand 
the  questions,  although  the  fault  is  thie  own,  or 
else  give  credit  for  answers  which  are  really  not 


66 


what  was  wanted.  Furthermore,  sueh  careless  formu- 
lation of  questions  se  ves  to  increase  the  unre- 
liaoility  of  marks  because  of  securing  poorer  sam- 
plings of  the  subject-matter  covered. 

It  is  frecueitly  fairly  easy  for  pupils,  es- 
pecially those  of    ore  than  average  intelligence,  to 
bluff  on  essay  examinations.  A  pupil  may  know  nothing 
or  practically  nothing  of  what  is  actually  called 
for  by  a  particular  question,  but  if  he  has  some 
knowledge  of  the  general  topic  with  which  the  ques- 
tion is  connected,  and  perhaps  also  some  skill  in 
guessing,  he  can  frequently  produce  an  answer  fur 
which  ne  will  receive  much  more  credit  than  he  de- 
serves. This  is  especially  true  where  the  teacher 
is  marking  the  papers  hurriedly  and  carelessly.  She 
is  liable  to  notice  that  the  pupil  has  written  an 
answer  of  considerable  length,  and  that  it  contains 
a  numoer  of  words  and  expressions  which  have  some- 
thing to  do  with  the  topic,  and  therefore,  without 
careful  examination  of  what  is  written,  give  him 
a  fairly  good  mark  upon  it. 

Some  of  the  claims  made  for  traditional  examina- 
tions are  not  fully  valid.  Frequently  the  time  limits 
upon  such  examinations  are  so  short  that  they  really 
become  chiefly  tests  of  memory  rather  than  of  reap 
soning,  organization,  and  other  abilities,  even  though 


67 


they  might  provoke  activities  of  these  sorts,  if 
sufficient  time  was  given.  Moreover,  the  actual 
exercises  or  questions  which  they  containirequently 
deal  with  actual  mat  rial  to  just  as  great  an 
extent  as  do  those  in  ew-type  tests. 

still  further,  several  varieties  of  the  mew- 
type  tests  do  stimulate  and  measure  pupils  critical 
ability,  their  discrimination,  judgment,  and  so 
forth.  Such  types  as  the  true-false,  which  requires 
them  to  decide  whether  statements  are  true  or  not,  the 
multiple-answer,  in  which  one,  or  sometimes  more, 
correct  answers  must  be  selected  from  a  group,  and 
others  can  be  made  to  serve  these  purposes. 

&ven  though  new-type  tests  deal  largely 
with  separate  points  or  facts,  the  material  covered 
can,  if  desired,  consist  of  general  principles, 
rules,  laws,  and  so  forth,  as  well  as  mere  bits  of 
information. 

Therefore,  although  it  is  probable  that  tradi- 
tional examinations  do  test  a  greater  variety  of 
mental  processes  than  do  objective  tests,  it  is  not 
inevotable  that  they  do  so,  and  the  latter  also  can 
be  made  to  measure,  at  least  to  some  degree,  most 
of  these  processes. 

4)  Conclusion-  Although  there  has  been  much 

recent  unfavorable  criticism  of  traditional  examina- 
tions, they  should  not  be  entirely  discarded  in 
favor  of  new-type  tests,  but  should  he  used  on  some 
occasions.  Sach  of  the  two  general  types  just  men- 


68 


tioned  has  its  peculiar  merits  and  advantages,  and 
should  "be  employed  when  it  "best  fulfills  the  de- 
sired end.  Traditional  examinations  are  usually 
much  easier  to  prepare,  test  a  number  of  mental  pro- 
cesses better  than  does  the  new-type,  do  not  offer 
a  great  opportunity  for  guessing  and  perhaps  not 
for  cheating,  are  not  as  liable  to  the  danger  of 
confusing  the  pupil,  and  in  several  minor  ways  are 
to  be  preferred.  On  the  other  hand,  traditional 
examinations  are,  as  is  hown  by  a  considerable 
mass  of  evidence,  less  reliable  than  the  new-type 
tests  because  they  do  not  secure  as  good  samplings 
of  pupils'  ability  and  knowledge,  and  because  their 
scoring  is  relatively  subjective  and  more  difficult. 
Bluffing  is  also  less  difficult. 


c 


r 


69 


Chapter  6. 

Teaching  History  through  the  Use  of  the  Standard  Test. 

1)  Growth  of  the  standard  test. 

2)  Nature  and  purpose  of  the  standard  test. 

3)  Additional  advantages  of  the  standard  test. 

4)  Some  limitations  of  the  standard  test. 

5)  Example  of  the  standard  test. 

6)  Conclusion. 


- 


70 


Chapter  6 

Teaching  History  through  the  Use  of  the  Standard  Test. 

1 

1)  Growth  of  the  standard  test  Standardized 

examinations  have  just  completed  the  first  quarter 
century  of  their  existence.  From  a  few  pioneer  attempts 
Rice,  Thorndii.e,  Stone,  Courtis,  and  others  in  the 

fields  of  spelling,  arithmetic,  and  reading,  the  move- 
ment has  grown  until  conservative  estimates  place  the 
total  number  of  available  tests,  and  scales  at  at  least 
five  hundred;  thereare  probably  considerably  more.  It 
is  impossible  to  secure  even  approximate  estimates  of 
the  numbers  of  standard  tests  administered  annually. 
There  are  several  educational  tests  whose  sales  have 
passed  the  million  r.ark  annually.   In  one  or  two  cases, 
two  million  is  a  more  nearly  correct  figure.  The 
total  number  of  standard  tests  sold  during  the  past 
year  (1928)  is  probably  at  least  twenty  million, 

possibly  somewhat  more. 

These  figures,  estimates  as  they  are,  point  to  the 
importance  of  standard  tests  as  measures  of  the  re- 
sults of  teaching.   It  seems  certain  that  the  curve 
of  the  use  of  standard  tests  is  rising  more  rapidly 
thai  is  that  of  the  increase  in  school  popluation. 

A  tendency  now  is  that  the  use  of  standardized 
tests  is  no  longer  commonly  t  ought  of  as  an  experi- 
ment |  or  as  something  definitely  apart  from  regular 

1.  From  Ruch,  C-.i.l.     The  Objective  or  Hew- Type 

Examination.     pp.  21-22 


71 


instructional  and  other  activities.   Instead,  it  has 
come  to  "be  an  integral  part  of  the  work  of  teachers 
in  many  systems,   Doth  large  and  small.  Undoubtedly, 
the  most  notable  example  of  a  large  city  system  in 
which  this  is  true  is  Detroit,  where  an  extensive 
program  was  developed,  hut  in  hundreds  of  others 
it  has  a  prominent  place.  Some  large  cities,  such 
as  Philadelphia,  Detroit,  and  Denver,  construct 
many  of  their  own  standardized  tests,   instead  of 
purchasing  them. 

2 )  Hature  and  purpose  of  the  Standard  test  

In  the  first  place,   so-called  objective  tests 
may  be  divided  into  two  groups,  standardized  tests 
and  the  new-type  examination.  Originally  and  in  its 
narrowest  sense  "standardized"  was  applied  to  a  tests 
that  had  been  widely  e  ,ough  given  that  the  results 
therefrom  indicated  what  might  be  expected  of  pupils 
of  a  given  age,  grade,  or  other  homogeneous  group. 

In  general  usage,  however,  the  adjective  "standard- 
ized" or  "standard"  also  implies  that  the  test  in 
question  has  been  carefully  constructed  according 
to  certain  general  principles,  and  embodies  exer- 
cises of  such  forms  that  pupils'  responses  are  re- 
latively, if  not  absolutely,  objective. 

Furthermore,  practically  all  tests  which  merit  the 


c 


72 


name  standardized  are  commercially  available,  that  is, 
may  be  purchased  from  a  publisher  by  anyone  desiring 
to  do  so. 

The  essay  examination  suffers  from  one  major 
defect  not  inherent  in  the  standard  test  or  the  newer 
objective  examination,  viz:   experience  and  experiment 
have  shown  that  the  results  of  an  essay  examination 
cannot  be  evaluated  fairly  "by  human  minds.   Its  in- 
accuracies are  th^se  of  the  human  mind  and  the  human 
prejudice.  Such  examinations  aeemingly  cannot  be 
freed  from  the  personal  equation. 

To  a  degree  that  an  examination  mark  or  grade 
reflects  the  knowledge,  attitudes,  and  prejudices  of 
the  marker  of  that  examination  paper,-  the  exa  ina- 
tion  is  not  a  true  measurement,   since  all  are  surely 
agreed  that  it  is  the  accomplishment  of  the  pupil 
which  is  to  be  measured.   If  the  same  pupil's  paper 

is  graded  all  the  way  from  40  to  90,    (as  many  in- 

1 

vestigators  have  found  ) ,  there  is  but  one  conclu- 
sion to  be  drawn,  viz:  the  pupil  has  not  been  mea- 
sured. To  be  at  the  same  time  a  "40"  pupil  (a  dunce) 
and  a  "90"  pupil  (a  candidate  for  the  class  validic- 
torian)  is  not  only  unthinkable  but  palpably  un- 
true] Such  a  finding  raises  the  suspicion  that  he  is 
neither,  a  conclusion  that  can  well  be  supported  on 

1.  Ruch,  Gr.14. ,  The  Objective  or  l\Te-  -Type  Examination. 

p. 20 


73 


the  ordinary  logic  underlying  our  "basic  theorems  of 
possibility. 

To  "be  taken  at  face  value,  any  examination  result 
must  meet  many  stringent  criteria,  and  one  of  these 
is  that  it  is  a  i  ieasure  of  the  pupil  -  not  the 
teacher,  not  his  class,  and  hot  the  school  system. 
Yet  it  must  "be  admitted  "by  any  fair  minded  student 
of  the  literature  that  the  traditional  examination 
is  prone  to  tell  us  as  much,  or  almost  as  much, 
about  whom  the  pupil  had  for  a  teacher,  as  it  does 
about  the  educational  equipment  of  the  pupil  himself. 

The  technical  term  for  this  weakness  in  the 
common  essay-type  examination  is,  in  our  modern  educa- 
tional terminology,  subjectivity  of  marking.   It  was 
as  a  relief  from  this  admitted  weakness  that  the  stan- 
dard test  and  the  objective  examination  were  introduced. 

The  standard  test  was  introduced  to  serve  several 
purposes.  These  serve  to  orient  our  thinking  about 
tests  and  examinations  in  general.  The  principal  aims 
of  the  standard  test  may  be  listed  as  follows: 

1.  They  (as  the  name  implies)  represent  an  attempt 

to  control  or  standardize  the  conditions  of  the  exam- 
ination period,  with  respect  to  directions,  time  al- 
lowances, method  of  responding,  etc. 

2.  They  are  objective  or  impartial,  i.e.,  the  personal 
equatioh~bf  the  examiner  is  minimized  or  eliminated  - 


c 


74 


minimized  in  the  administration,  and  eliminated  al- 
most or  quite  completely  in  the  scoring  of  the  ex- 
amination. 

3.  They  provide  norms  or  standards  (as  the  name  fur- 
ther implies)  by  which  the  scores  of  individual  pupils 
may  be  evaluated  and  interpreted  in  the  light  of 
facts.  Such  facts  are  the  performances  of  large  num- 
bers of  supposedly  typical  pupils  on  the  same  tasks. 

These  aims  can  all  be  attained  to  degrees  com- 
mensurate with  the  practical  needs  of  education,  the 
third  aim  being  the  most  difficult,  and,  on  the 
whole,  decidedly  the  least  important. 

One  characteristic  of  standard  tests  is  their 
comprehensiveness  within  the  field  measured.   It  is 
constantly  being  emphasized  by  the  makers  of  stan- 
dard tests  that  justice  in  comparisons  of  two  or 
more  communities  on  the  basis  of  these  tests  de"  nds 
upon  the  extent  to  which  they  are  comprehensive,  - 
the  extent  to  which  they  apply  equally  to  the  needs 
of  the  different  communities  which  they  are  designed 
to  serve.  A  lack  of  a  given  content  in  a  test  would 
re-act  against  a  community  which  emphasized  that 
group  of  elements,  and  might  act  positively  for  that 
group  which  neglected  it. 

Standard  tests  are  so  constructed,  through  the 


e 


75 


selection  of  the  elements  which  they  contain,  that 
they  may  De  as  comprehensive  within  the  field  to  be 
measured  as  it  is  possible  to  make  them. 

Another  characteristic  of  standard  tests  is  the 
universality  of  their  content.  A  field  of  teaching 
which  is  localized  or  provincial  in  character  can- 
not be  placed  successfully  in  a  standard  test.. 
Theoretically,  such  a  test  could  be  standardized  for 
a  £"iven  restricted  community  where  the  localized  sub- 
ject matter  was  used,  but  actually  the  success  of 
standard  tests  lies  in  their  almost  complete  use  of 
universalized  subject  matter.  A  test  in  arithmetic, 
for  example,  should  utilize  the  subject  matter  in 
arithmetic  of  admittedly  universal  teaching.  Arith- 
metic is  a  unit  of  subject  matter  that  is  widely 
distributed  in  much  the  same  way,  and  is,  therefore, 
peculiarly  adapted  for  use  in  standard  tests.  Other 
subject  matter  in  the  schools  -  geography,  and  his- 
tory, for  example  -  have  far  less  universality  of 
content,  and  are,  therefore,  to  that  extent  restricted 

in  the  values  of  standard  tests  which  can  "toe  utilized 
1 

for  them.  Rugg  is  convinced  "that  our  judgment  and 
grading  of  pupil  re-action  will,  at  least,   be  refined 
through  the  use  of  standard  history  tests.  They  are 
valuable,  first,  to  check  the  basic  aims  and  out- 
comes of  history."  

1  Rugg,  E......  "Character  and  Value  of  Standardized 

Tests  in  History". 

School  Review,  27:757  -  771.     Dec. 1919 


6 


76 


3 )  Additional  advantages  of  the  standard  test  

In  general,  standardized  tests  are  more  carefully 
and  scientifically  made  than  non-standardized  ones. 
Their  authors  are  usually  better  versed  in  methods  of 
fc£st  construction,  and  have  a  wider  knowledge  of  sub- 
ject matter  than  do  regular  classroom  teachers.  There- 
fore, standardized  tests  conform  more  closely  to 
general  or  best  practice  than  ordinary  examinations, 
and  thus  tend  to  produce  uniformity.  Since  more  time 
and  care  is  devoted  to  their  construction,  they  are 
generally  more  objective,  reliable,  and  valid  than 
non-standardized  tests,  although  the  latter  can  be 
made  equal  to  them,  if  sufficient  pains  are  ta#en  to 
do  so . 

The  distinguishing  feature  of  standardized  tests, 
that  norms  have  been  established,    (previously  referred 
to) ,   is  an  advantage  in  that  it  renders  possible  the 
comparison  of  pupils  with  others.  Norms  are,  however, 
frequently  too  general  t©  be  of  high  value,  and  are 
even  liable  to  misinterpretation  and  misapplication 
because  of  this  fact.  Also  many  tests  may  serve  their 
purposes  without  such  comparisons  being  needed. 

A  third  advantage  of  standardized  tests  is  that 
their  use  saves  time  in  both  preparation  and  scoring. 
The  objection  may  be  raised,  however,  that  time  de- 
voted to  thoughtful  preparation  of  test  exercises  is 


77 


profitably  spent  and  should  not  be  lessened.  Further- 
more, the  saving  of  time  in  scoring  is  little, if  any, 
when  they  are  compared  with  the  best  new-type  tests 
made  by  teachers. 

A  fourth  advantage  of  standardized  tests  is  that 
most  good  standard  tests  possess  two,  or  occasionally 
more,  equivalent  forms,  thus  enabling  one  to  test  the 
same  abilities  of  a  group  of  pupils  two  or  more  times 
with  tests  of  the  same  difficulty.  This  makes  it 
possible  to  measure  progress  much  more  accurately 
than  if  such  instruments  v ere  not  available.   It  is 
difficult  for  teachers  to  prepare  such  duplicate  forms 
in  a  manner  that  insures  equivalence. 
4)  Some  limitations  of  the  standard  test-  

1.  Standard  tests  are  inflexible  and  cannot  be 
clesely  adapted  to  the  idiosyncrasies  of  local  school 
conditions.  They  are,  of  necessity,  general  enough  to 
meet  moderately  well  a  wide  variety  of  curricula. 

2.  In  view  of  the  foregoing,  they  need  constant 
supplementation  in  a  complete  measurement  program. 

3.  Standard  tests  are  somewhat  expensive.  The 
range  of  prices  varies  from  about  one  cent  per  pupil 
to  at  least  ton  cents  per  pupil.  This,  of  course,  is 
a  practical  limitation,  not  a  theoretical  one.  It 
should  also  be  noted  that  there  is  considerable  cor- 
relation between  cost  and  worth.  As  is  the  case  with 

Ruch,  G.M.     The  Objective  or  II ew- Type  Examination. 

pp.  22-23 


78 


all  commercial  products,  tests  are  sold  in  a  competi- 
tive market,  and  costs  are  reckoned  upon  the  "basis  of 
the  expenses  of  production. 

4.  The  majority  of  standard  tests  are  of  little 
value.  A  large  number  are  nothing  more  than  ;f examina- 
tions with  norms",  produced  "by  persons  without  special 
training  or  knowledge  of  test  construction.   If  one 
hundred  of  the  uest  were  selected  and  the  rest  de- 
stroyed, the  loss  would  "be  Negligible. 

Only  the  first  mentioned  of  these  limitations  of 
the  standard  test  is  serious.  The  others  may  be  over- 
come  by  careful  selection,  by  the  planning  of  measure- 
ment programs,  and  by  efficient  school  budgeting. 

It  would  appear  to  be  impossible  to  adept  the  stan- 
dard test  as  the  sole  element  in  a  measurement  pro- 
gram.  It  might  well  repay  the  cost,  but  it  is  to  be 
doubted  whether  local  needs  could  ever  be  met  satis- 
factorily. Both  the  traditional  and  the  new-type  ex- 
amination are  free  from  this  limitation  of  non-adapt- 
ability to  local  school  curricula. 

1 

Another  critic,  Tryon  says,  as  regards  the  limita- 
tion of  the  standard  test,  that "history  test-markers 
have  had  difficulty  in  devising  exercises  that  test 
a  variety  of  mental  processes,  such  as  reasoning, 
association,  and  comparison.  Too  many  of  the  exer- 
cises  now  available  test  memory  only".  

1.  Tryon,  li .  1- .  —  "Standard  and  New- Type  Tests  in  the 

Social  Studies"-  16:172-178 

April. 1927 


79 


1 

5 )  Sample  of  a  standard  test— 


Part  I 

Directions.  Read  each  of  the  following  statements 
very  carefully.   If  a  statement  is  true,  place  a  plus 
(+  ;  in  the  parenthesis  following  it;  if  it  is  false 
place  a  (o)  in  the  parenthesis  following  it.   If  you 
are  not  sure  whether  a  statement  is  true  or  false, 
leave  the  parenthesis  blank.  Do  not  guess. 

Twent y  minut e  s . 

Samples. 

a.  George  Washington  was  the  first  president  of 

the  United  States  (  ) 

"b.  The  panic  of  1857  had  no  affect  upon  the 

economic  life  of  the  South  (  o  ) 


1.  After  the  passage  of  the  Toleration  Act  of 
1649,  Maryland  enjoyed  greater  freedom  of 
conscience  in  religious  matters  than  did 

Rhode  Island  (  ) 

2.  Horse-racing,  corn-husking  "bees,  and  card 
playing  were  forbidden  by  lav/  in  Colonial 
America  (  ) 

5.  The  Proclamation  of  1763  forbade  colonists 
to  settle;  in  the  territory  acquired  by 
Great  Britain  from  France  as  a  result  of 
the  French  and  Indian  War  (  ) 

4.  The  economic  development  of  Pennsylvania 
was  retarded  by  William  Penn's  refusal  to 

admit  German  immigrants  (  ) 

5.  The  principal  motive  of  the  majority  of 
those  who  left  Europe  for  America  during 
the  Colonial  period  was  to  better  their 

social  and  economic  donditions.  (  ) 

If  you  finish  before  the  time  is 
up,  go  on  to  Part  II. 

Number  right    Number  right  

Number  wrong    Number  wrong  

Number  omitted   Right  ;;:inus 

wrong  (Score) 


1. Excerpts  from  Columbia  Research  Bureau  American  ITistor 
Test.  Published  in  1926  by  the  World  Book  Co.,  Yonker  s 

on-Hudson,  N.Y. 


♦ 


Part  II 


Directions.  Below  are  eight  groups  of  items,  each 
of  which  is  dividea  into  tv/o  columns,  Each  item 
in  the  left-hand  column  is  numbered.  Each  item 
in  the  right-hand  column  is  followed  by  paren- 
thesis. Place  in  the  parenthesis  the  number  of 
that  item  in  the  left-hand  column  that  is 
associated  with  the  item  in  the  right-hand 
column.  Each  group  is  a  separate  problem;  do 
not  match  items  in  different  groups.     Twenty  minute 


Samples . 


a .  1 . 
2. 

3. 


2. 
3. 
4 . 
5. 


II. 1. 
2. 

3. 
4. 
5. 
6. 
7. 


1492 
1620 
1776 


declaration  of  Independence, 
Discovery  of  America  


I.  1.  Pennsylvania 


Platoon  system. . . . 
Complete  religious 
toleration  


Massachusetts, 
New  York. 
Virginia. 
Georgia. 


Nathaniel  Bacon. 
Cod-fishing  


Duke  of  York  

Governor  Berkley. 


Largest  colonial  city. ...  { 
James  Oglethorpe  ( 


James  Otis  ( 

College  of  William  and  L'ary( 


1620 

1643 


1660 
1763 
1793 
1804 
1825 


British  Navigation  Act  ( 

New  England  Confederation 

formed  ( 

Invention  of  cotton  gin  ( 

Lev/is  and  Clark  Expedition.  . .  ( 
*nd  of  French  and  Indian  V.'ar.  ( 


If  you  finish  before  the  time  is  up, 
complete  Part  I  or  go  on  to  Part  III 

Number  right  iSco 


Part  III 


Directions.  Below  are  several  statements  and 
questions,  each  of  which  is  followed  by  five 


81 


phrases.  i..ark  in  the  parenthesis  the  number 
of  tnat  phrase  that  correctly  completes  the 
statement  or  answers  the  question .    (One,  and. 
only  one,  phrase  is  correct  in  each  case). 

Thirty-five  minutes 

° ample ; 

a.  One  of  the  principal  products  of  Colonial 
Kew  York  was 


l.Rice      2.   Indigo        3.  Flour         4.  Gold 

5  .  aluminum  [  3) 


1.  %  the  :.-olai-ses  ^-ct  of  1733  ^ivland  sought  

1.  to  drive  the  colonial  runs  manufacturers  out 
of  business. 

2.  to  safeguard  the  interests  of  the  British 
*'est  Indian  planters. 

3.  to  ruin  dutch  shipping 

4.  to  bring  on  a  war  with  Spain 

5.  to  encourage  the  slave  trade   (  ) 

2.  The  ^henandoah  Valley  was  principally  settled 

by  

1.     the  Dutch  2.   Irish  Catholics 

3.  German  and  Scotch- Irish 

4.  Scandinavians      5.     Huguenots  (  ) 


If  you  finish  before  the  time  is  up, 
complete  Parts  I  and  II,  or  go  on  to 
Part  IV 


llurnber  right   (Score) 

Part  IV 

directions.   In  each  of  the  blanks  at  the  right 
put  the  word  or  shortest  phrase  that  will  eo  .plete 
the  sentence  correctly,   ''rite  carefully  and 
clearly.  Fifteen  minutes 

Sample ; 

a.     The  name  of  the  first  per  .anent  English 

settlement  in  America  was  , Jamestown) 


82 


1. 


The  name  of  the  author  of  the  pamphlets 
entitled  "Common  Sense!'  as 


2. 


The  system  by  which  natives  endeavored 
to  sell  much  and  "buy  little,:  in  order  to 
secure  a  so-called  "favorable  balance" 
of  trade  was  known  as  ( 


If  you  fi  ish  before  the  time  is  up, 
look  over  all  four  parts  and  correct 
any  mistakes  you  have  made. 


Number  right . . . 


{ Score) 


6)  Conclusion 


Standardized  tests  have  a  legi- 


timate place  in  high  school  measurement  in  history 
and  should  not  be  wholly  displaced  by  other  types 
of  measurement.  Certain  advantages  accrue  from  the 
standardized  tests  which  the  others  cannot  supply. 

They  are  characterized  by  the  care  exercised  in 
theirmalcing ,  by  the  objectivity  of  their  scoring,  by 
the  comprehensiveness  v/ithin  the  field  measured,  by 
the  universal  character  or  their  content,  and  by  the 
fact  that  they  are  accompanied  by  norms,  or  standards 
of  achievement. 

°tandardized  tests  are  generally  more  objective, 
reliable,  and  valid  than  either  the  traditional-type, 
or  the  new-type  examination.  Also,  their  use  saves 
time  in  both  preparation  and  scoring.  ^cuiivalent  forms 
provided  by  most  good  standardized  tests  insure  greater 
a  curacy  in    .easuring  progress. 

On  the  other  hand,  the  standardized  test  should  not 


<€ 


83 


be  the  sole  element  in  a  measurement  program.  It 
has  its  limitations  as  well  as  its  advantages .Among 
the  former  is  the  serious  one  that  standardized  tests 
are  not  sufficiently  flexible  to  satisfactorily  meet 
local  needs.  They  are  non-adaptable  to  local  school 
curricula. 

History  test-makers  have  had  difficulty  in  devising 
exercises  that  test  a  variety  of    valuable  mental 
processes.  Too  many  of  the  exercises  now  available 
test  memory  only. 


9 


Chapter  7 


Teaching  History  through  the  Use  of  the  Oojective  or 
New-Type  Examination, 

1)  Nature  of  the  new-type  examination. 

2)  Advantages  of  the  new-type  examination. 

3)  Dis-advantages  of  the  new-type  examination. 

4)  Five  most  generally  useful  varieties  of  the 
new-type  examination  forms. 

5)  Conclusion. 


85 


Chapter  7 

Teaching  History  through  the  Use  of  the  Oojective  or 
New-T;:pe  Examination. 

1)  Nature  of  the  new-type  examination   The  new 

examination  or,  "better,  the  new-type  test,  is  the 
name  commonly  given  to  tests  or  exercises,  generally 
constructed  by  a  teacher  for  her  own  use,  that  make 
use  of  the  forms  and  scoring  methods  of  standardized 
tests,  so  as  to  possess  relatively  high  objectivity, 
but  have  not  gone  through  a  process  of  careful  try- 
ing out  of  material  included,  have  not  been  given 
to  large  numbers  of  pupils,  and  are  most  generally 
available  for  use  by  others. 

The  new-type  tests  include  true-false  state- 
ments and  yes-no  questions,  single-answer  questions, 
multiple-answer  exercises,  matching  exercises,  com- 
pletion statements,  and  other  similar  types. 

No  sharp  distinction  can  be  drawn  between  the 
standardized  test  and  the  new-type  test,  since  there 
are  tests  in  all  stages  of  development,  from  an  ordi- 
nary new-type  test  constructed  by  a  teacher  for  use 
with  a  single  class, up  to  thoroughly  standardized 
test.  Both  are  thought  of  as  opposed  to  traditional 
examinations,  the  primary  difference  being  that  they 
call  for  very  brief  pupil  responses,  and  are  oojective 


86 


or  nearly  so. 

The  aim  of  this  chapter  is  to  state  as  fully 
as  possible  both  the  advantages  and  the  dis-advan- 
tages  of  the  new- type  examination,  to  illustrate  the 
varieties  of  the  new-type  examination  forms  which 
are  most  valuable  and  which,  perhaps,  receive  the 
widest  use;  and,  finally,  to  form  a  conclusion  of 
the  entire  chapter. 

2 )  Advantages  of  the  new- type  examination  

The  merit  of  new- type  examinations  which  is 
probably  most  often  stated  first  is  that  they  are 
more  reliable  than  those  of  the  traditional  variety. 
New-type  tests  permit  pupils  to  respond  to  a  great 
many  more  items  or  exercises  than  do  traditional 
examinations  consuming  the  same  amount  of  time. 
Therefore  they  yield  much  better  and  more  compre- 
hensive samplings  of  pupils T  achievement,  and  so 
result  in  :.-ore  reliable  marks.  It  is  very  unlikely, 
when  a  pupil  must  respond    o  a  large  number  of  ex- 
ercises, each  of  which  calls  for  a  response  more  or 
less  distinct  from  fchat  of  any  other,  that  he  will 
just  happen  to  know,  or  not  to  know,  a  much  larger 
proportion  of  then  than  is  true  for  the  total  a  ount 
of  subject-matter  covered  by  the  test. 

The  second  of  the  two  chief  causes  for  the 
higher  reliability  of  new-type  tests  is  that  they 


87 


are  objective  or  nearly  so  in  their  scoring.  The 

answers  to  most  questions  of  the  traditional  type 

cannot  be  scored  as  definitely  right  or  wrong,  but 

may  be  partially  right  to  almost  any  degree.  As  a 

result,  great  differences  of  opinion  exist  among 

supposedly  competent  teachers  as  to  how  much  credit 

should  be  allowed  for  the  same  answers. 

Some  rather  convincing  evidence  concerning  re- 

1 

liability  is  presented  by  Ruch  ,  who  is  one  of  the 
leading  advocates  of  the  new-type.  In  one  place  he 
gives  figures    showing  the  reliability  coefficient 
of  eight  New  York  Regents'  Examinations  in  their 
ordinary  or  subjective  form,  and  likewise  of  the 
same  examinations  when  converted  into  objective  form. 
The  average  coefficient  of  reliability  in  the  second 
case,  that  is,  for  the  objective  form,  was  .65, 
whereas  the  average  for  the  subjective  form  corres- 
ponding to  this  was  only  #42.   If  a  correction  were 
applied  to  balance  the  fact  that  pupils  spent  more 
time  working  on  the  subjective  than  on  the  objective 
form,  the  figure  of  165  should  be  raised  to  .69. 
Tne  reason  or  justification  for  making  such  a  cor- 
rection is  as  follows:  If  a  test  is  lengthened  by 
adding  more  of  the  same  type  of  exercises  as  compose 
the  original  portion,  and  other  conditions  are  in 
no  way  changed,  its  reliability  is  increased.  This 

1.  Ruch,  G.lvl.,  The  Objective  or  New-Type  Examination. 

pp. 23-53 


88 


increase  is  due  to  the  fact  that  making  it  longer 
causes  it  to  yield  a  more  satisfactory  sampling  of  tije 
total  field  covered.  Such  an  increase  in  the  length 
of  a  test  results  in  increasing  its  reliability  by 
a  ratio  e  :ual  to  the  square  root  of  the  ratio  of  its 
length  after  the  additional  exercises  have  been 
aided  to  what  it  was  in  the  first  place.  For  example, 
if  enough  similar  material  is  added  to  a  test  to  make 
it  four  times  as  long  as  it  was  originally,  the  re- 
liability of  the  lengthened  test  is  twice  as  great 
as  that  of  the  first  one,  since  the  square  root  of 
four  is  two. 

It  would  be  easily  possible  to  quote  dozens, 
proDacly  even  hundreds,  of  reports  of  results  which 
agree  with  those  of  Ruch,  that  is,  show  greater  re- 
liability for  new-type  than  for  traditional  examina- 
tions. There  are,  however,  a  fev;  cases  in  which  data 

have  been  obtained  which  indicate  an  opposite  con- 

1 

elusion.  Thus  Crawford  and  Rayiioldo  conclude  from 
their  experiments  that  fifteen  out  of  twenty  com- 
parisons indicate  that  traditional  examinations 
possess  greater  reliability  than  true-false  tests. 
They  state,  however,  that  the  true-false  tests  used 
were  made  by  persons  comparatively  unskilled  in  so 
doing,  also  that  the  students  upon  whom  the  tests 

1.  Crawf  ord,u  .c  ;  Eaynaldo,     . A. 

"Some  Experimental  Comparisons  of  True-False  Tests 
and  Traditional  Examinations," 
School  Rebiew  33:  698-706         Nov. 1925 


89 


were  tried  out  were  not  familiar  with  the  true-false 
form.  Furthermore,  in  all  the  comparisons  except 
one  the  traditional  examinations  preceded  the  others, 
and  in  the  one  in  which  the  true-false  test  was,  by 
accident,  given  first,  it  showed  itself  distinctly 
more  reliable  than  the  following  discussion  examina- 
tion. After  mentioning  several  other  xactors,  their 
conclusion  is  that  the  data  they  present  are  not 
sufficient  to  warrant  a  general  statement  as  to  which 
type  of  test  is  superior.  However,  there  seems  to  be 
little  reasonable  doubt  that,  if  new-type  tests  and 
discussion  examinations  are  constructed  with  the 
same  degree  of  care  and  expertness,  and,  if  the 
pupils  spend  the  same  amount  of  time  working  on  each, 
the  results  on  the  former  will  be  decidedly  more  re- 
liable than  those  on  the  latter.  It  is  probable  that, 
if  several  varieties  of  the  new-type  examination  are 
combined  into  one  test,  the  resulting  reliability 
of  the  total  scores  will  be  even  greater  than  if 
only  one  form  is  used.   In  many  of  the  investigations 
the  comparisons  have  been  n.ade  on  the  basis  01  a 
single  form  only,  or,  if  on  several  forms,  the  figures 
have  been  reported  separately  for  each. 

Pupils  feel  that  :.any  of  the  ratings  they  re- 
ceive on  traditional  examinations  are  really  unreliable 


90 


and  unjust.  On  the  other  hand,  the  comparatively  high 
objectivity  of  new-type  tests  produces  a  distinctly 
favorable  re-action.  It  renders  the  pupils  much  more 
satisfied  with  the  marks  which  they  receive,  and 
enables  teachers  to  justify  marks  to  pupils  and  their 
parents  much  more  easily.  When  pupils  compare  papers 
with  one  another  they  see  that  the  same  response  has 
been  scored  in  the  same  way,  no  matter  on  whose  paper 
it  occurred,  and  thus  Their  confidence  in  the  meaning 
of  marks  and  the  reliability  of  those  they  receive 
is  increased.  Furthermore,  the  quality  of  near-objec- 
tivity renders  it  possible  for  pupils  to  score  their 
own  answers  or  those  of  one  another  on  many  occasions, 
and  thus  sav;  the  teacher  considerable  work. 

Hot  only  because  of  objectivity  in  scoring, but 
for  other  reasons  also,  pupils  tend  to  prefer  new- type 
tests.  It  is  true  that  much  depends  upon  the  attitude 
of  the  teacher  and  how  the  tests  are  presented  to  the 
pupils,  but,  if  the  teacher  is  not  prejudiced  against 

the  new-type,  pupils  will  almost  always  favor  them. 
1 

Kinder  reports, for  example, that  of  more  than  two  hun- 
dred students,  all  but  seven  preferred  the  new-type. 
2 

Brinkley  found  that  pupils  preferred  a  mixture  of 
essay-type  and  new-type  to  all  of  either  one  alone. 
Other  reasons  why  new-type  examinations  are 

1.  Kinder,  J.S.  -"Supplementing  our  Examinations" . 

Education,  45:  557-566  tlay,19S5 

2.  Brinkley,  S.Gr,  "Values  of  Hew- Type  Examinations 

in  the  High  School  with  special  reference  to  History'1 
Teachers  College, Columbia  University, Contributions 
to  Education,  No.  161    p. 59  1924. 


91 


preferred  are;  that  results  can  usually  be  known  soon 
a_ter  the  tests  are  taken;  that  less  nervousness  and 
fear  are  aroused;  that  there  is  little  danger  of 
answers  being  midunder stood;  that  the  personal  likes 
or  dislikes  of  teachers  have  practically  no  opportunity 
to  affect  scores,  and  that  there  is  no  visual  or 
writing  strain. 

A  fact  more  or  less  dependent  upon  reliability, 
and  yet  distinct  from  it,  is  that  new-type  tests 
possess  greater  validity  than  do  discussion  examina- 
tions. This  is  caused  by  their  greater  objectivity  and 
reliability,  and  also  by  the  fact  that  the  pupils1 
answers  are  very  little  affected  by  such  factors  as 
their  ability  in  English,  handwriting,  and  so  forth. 
That  is  to  say,  the  answers  are  indicative  of  their 
knowledge  of  the  subject-matter  covered,  and  not  of 
extraneous  abilities  which  may  enter  into  their  ans- 
wers on  essay  examinations. 

Many  of  the  same  writers  who  have  dealt  with 

the  question  of  reliability  have  also  submitted  data 

1 

regarding  validity,  as  also  have  others.  McAfee  found 
correlations  of  .75  and  .79  for  new-type  tests  with 
a  composite  measure  composed  of  both  new  and  tradi- 
tional examination  ..arks,  standardized  test  scores, 
and  teachers'  marks.  The  correlation  of  discussion 
examination  marks  with  the  same  composite  was  only  .66. 

1. McAfee,  L.  0.,-  "The  Reliability  of  Non-Standardized 

Point  Tests  " 
tie  mentary  School  Journal,  24; 579-585 

April  1924 


i 


92 


It  is  true  in  the  ease  of  validity  as  in  that 

of  reliability,  that  not  all  those  who  have  studied 

the  question  are  in  entire  agreement.  One  of  the  most 

1 

careful  investigations  reported  is  that  of  Brinkley 
who  reached  the  conclusion  that  with  tests  of  equal 
length,  as  measured  by  time  spent  in  testing,  and 
prepared  by  teachers  with  training  in  the  matter  of 
test  construction,  one  type  of  test  yielded  prac- 
tically as  good  results  as  another  for  measuring 
senior  :iigh  school  achievement  in  history.  Y/ith  one 
or  two  exceptions,  he  found  this  true  whether  the 
ashievement  measured  was  general  achievement  for  the 
course,  ability  to  think  with  the  materials  of  the 
course,  or  information,  ^e  also  states  that,  for 
measuring  general  achievement  in  history,  essay  exam- 
inations  are  more  valid  than  new-type  tests  prepared 
by  ordinary  high  school  teachers,  and  even  slightly 
more  valid  than  those  prepared  by  teachers  trained 
in  the  construction  of  the  new-type.  In  the  case  of 
new-type  tests  prepared  by  Brinkley  himself  the 
validity  equalled  that  of  essay  examinations.  For 
measuring  ability  to  think  he  found  that  the  two 
types  possessed  about  the  same  validity,  and  for 
measuring  stock  of  information  that  the  new-type 
was  slightly  more  valid. 

1.  Brinkley,  S.G-.  "Values  of  New-Type  Examinations  in 
the  High  School  with  Special  Reference  to  History." 
Teachers  College,  Columbia  University      p. 56 


93 


1 

Odell  writes  the  following:  "It  seems  to  me 
that  no  general  conclusion  can  be  drawn  as  to  which 
type  of  examination  is  more  valid.  The  purpose  which 
an  examination  is  intended  to  serve  must  he  taken 
into  account.  For  the  measurement  of  stock  of  informa- 
tion and  knowledge  of  facts,  the  evidence  seems  to 
support  the  statement  that  the  new  examination  is 
more  valid  than  the  discussion  type.  For  the  measure- 
ment of  other  outcomes  of  instruction,  the  data  avail- 
able at  present  do  not  warrant  the  statement  that  the 
new-type  examination  is  known  to  be  superior  to  the 
traditional  type.  In  other  words,  each  has  its  parti- 
cular place  and  its  special  functions  where  it  fehould 
be  preferred  to  the  other." 

Since  scores  upon  objective  or  near-objective 
tests  are  helped  very  little  Djt  knowledge  that  has 
something  to  do  with  the  point  at  issue  but  does  not 
specifically  include  it,  their  use  tends  to  lead 

pupils  to  ac  uire  relatively  exact  and  detailed  knowl- 
1 

edge.  Thornton  says,  "It  is  evident  that  pure  knowl- 
edge must  oe  measured  apart  from  the  other  factors  of 
historical  ability.  Only  one  aoility  can  be  tested  at 
a  time.  Since  all  teachers  will  aesire  to  measure  in- 
formation, as  well  as  the  other  factors,  a  possible 
use  for  informational  tests  at  once  appears.  In  his- 
tory the  specific  information  must  be  called  for  in 
1  Odell,  C.W.  op. eit. ,    p.  196 

1  Thornton,  E.V/.-"The  Use  of  Informational  Tests  in 

American  History  Teaching." 
20:  12-16,     Jan. 1929 


* 


94 


the  test,  if  we  are  to  know  whether  the  pupil  has  it 
or  not. " 

Also,  new-type  examinations  point  out  very 
definitely  the  particular  things  which  are  not  known, 
and  thus  pave  the  way  for  very  definite  and  purposeful 
remedial  instruction.  It  is  felt  that  objective  tests 
have  made  a  real  and  valuable  contribution  toward 
improvement  of  instruction,  as  the  teacher  studies 
the  test  paper  as  a  physician  does  the  findings  of 
his  thermometer,  and  stethoscope ;then,  having  diag- 
nosed, proceeds  to  modify  instruction  to  remedy  the 
weak  places  revealed. An  essay  examination  may  reveal 
a  general  lack  of  knowledge  on  a  certain  topic,  but 
it  rarely  points  out  the  exact  points  which  need 
attention. 

Pre-Tests  in  the  objective  form  are  now  given 

to  history  classes  by  many  teachers. 

1 

One  high  school  teacher  says,  "the  great  value 

of  the  pre-test  is  that  it  acquaints  the  teacher  with 

at  least,  a  part  of  the  attitudinal  background,  or 

the  informational  bac&ground  which  each  student 

brought  to  the  subject.  Thus  the  intelligent  use  of 

this  knowledge  might  very  well  determine  which  groups 

of  facts  or  ideas  would  need  to  be  stressed  and  which 

not.  A  true-false  pre-test  can  well  be  use:  for  this 

purpose.  "    

I.Everett,  S. -"Objective  Tests  the  Best  Discoverer  of 
Pupil  Attitudes, n 

The  Historical  Outlook,  20:335-337 

Nov. 1929 


95 


1 

Kepner  writes  that  the  primary  purpose  of  this  kind 
of  test  is  diadnostic.  It  aims  to  measure  objectively 
the  extent  of  the  background  which  a  pupil  or  group 
of  pupils  bring  to  their  subject  in  the  secondary 
school. 

New-type  tests  not  only  aid  the  teacher  in 

diagnostic  and  remedial  work,  but  make  it  easier  for 

pupils  to  check  up  on  the  results  of  their  own  study. 

It  is  not  bery  difficult  for  a  pupil  to  determine 

whether  he  knows  certain  facts  definitely  or  not, 

and  if  he  finds  that  he  is  ignorant  of  some  of  them, 

to  devote  further  study  to  those  not  known. Therefore, 

new-type  tests  provide  better  motivation  for  study 

than  do  discussion  examinations. 

Workbooks  in  history  have  come  to  the  forefront 

2 

greatly  in  the  past  two  or  three  years.  Wesley  divides 
them  into  two  classes,  the  general  and  the  specific. 

A  workoook  of  the  general  type  is  organized  on 
a  topical  or  chronological  basis,  and  usually  con- 
tains citations  to  various  texts.  So  far,  this  type 
seems  to  have  been  rather  restricted  in  number. 

A  workbook  of  the  specific  type  is  based  upon 
a  published  text  and  parallels  it  in  organization. 
The  specific  workbook  usually  contains  topics,  pro- 
blems, projects,  exercises,  maps,  drills,  and  tests 

1.  Hepner,  T.-  "An  Aspect  of  History  Testing," 

The  Historical  0utlookml5 :414-417 , ^ ec. 1924 

2.  Wesley, S. 3. -"Workbooks  in  the  Social  Studies," 

The  Historical  Outlook, 22:  151-154, AdI  1931 


96 


which  are  "based  upon  material  foundcin  the  text  cook. 
Some  of  the  specific  workbooks  contain  study-guide 
sections  covering  the  reading  matter  in  the  text. 
The  specific  workbook  is  intended  to  facilitate  the 
mastery  of  a  specified  text. 

The  idea  back  of  the  workbooks  is  to  make  the 
material  of  the  social  studies  more  concrete,  of 
parelleing  so  far  as  possible  the  scientific  method 
of  approach. 

V/orkbooks  do  tend  to  make  the  content  of  the 
social  studies  more  definite.  The  student  knows 
what  to  hunt  for  and  the  teacher  knows  what  to  ask 
when  the  class  assembles. 

The  wrokbooks  are  designed  to  absorb  at  least 
some  of  the  functions  of  textbooks,  mapbooks,  note- 
books, scrapbooks,  reading  books,  and  written  re- 
ports. 

It  would  appear  that  the  utility  of  workoooks 
increases  up  to  and  including  the  second  year  in  high 
school,  and  that  it  probably  declines  after  the  third 
year  in  high  school. 

It  is  probable  that  the  general  workbook  can  be 
adapted  to  more  advanced  students  than  the  specific 
workoook,  for  the  former  can  De  less  mechanical  and  can 
allow  greater  leaway  in  procedure. 

The  tests  which  are  included  in  workbooks  cover 


9 


• 


97 


the  chapters,  and  are  proDaoly  superior  to  those  usually- 
made  by  the  teacher.  They  are  not  standardized,  and 
afford  nothing  more  than  an  objective  basis  for 
marking  a  part  of  the  work,  but  that  is  a  decided 
merit. 

One  point  in  which  new-type  examinations  possess 
considerable  advantage  over  traditional  ones  is  in 
the  ease  of  scoring.  Careful  and  accurate  scoring  of 
the  answers  to  traditional  questions  is  relatively 
difficult  amd  requires  considerable  time.  By  the  pre- 
paration and  use  of  a  list  of  correct  answers  which 
can  usually  be  put  in  such  form  that  they  can  be 
matched  with  the  pupils1  responses,  the  scoring  of 
new-type  tests  is  rendered  easy.  Not  only  is  time 
saved,  but  the  type  of  mental  activity  engaged  in  while 
scoring  is  much  less  arduous  and  tiring  than  is  true 
in  the  case  of  essay  examinations. 

If  it  is  desired  amd  practicable,  some  clerk 
or  other  person  w&o  does  not  possess  any  particular 
knowledge  of  the  subject-matter  covered  can  score 
most  new-type  examinations  satisfactorily. 

It  is  also  frequently  possible  to  have  satis- 
factory scoring  done  by  the  pupils  themselves,  who  may 
mark  their  own  papers  or  those  of  others.  They  can 
easily  see  just  what  their  errors  are  and  also  learn 
how  to  correct  these  errors.   In  aost  cases  it  is  not 


f 


98 


necessary  for  the  teacher  to  give  very  much  help  in 
this  respect,  if  the  pupils  are  properly  motivated 
so  that  they  have  formed  the  haoit  of  studying  their 
retruned  test  papers  and  trying  to  profit  as  much  as 
possible  thereby.  They  will  ordinarily  gain  much  more 
benefit  from  unaided  or  only  slightly  aided  study  of 
new-type  test  papers  than  from  that  of  traditional  ex- 
amination papers.  It  is  impossible  for  pupils  to  de- 
ceive themselves  as  to  the  correctness  and  quality  of 
their  answers. 

The  fact  that  the  construction  of  a  fairly  large 
number  of  new-type  exercises  or  items  calls  for  the  ex- 
penditure of  more  thought  than  that  of  a  few  essay 
questions  frequently  causes  tea.hers  to  be  more  care- 
ful and  thoughtful  in  so  doing.  This,  of  course, leads 
to  the  result  that  more  time  is  required  to  construct 
an  objective  or  near-objective  test  than  for  an  essay 
examination  consuming  the  sau:e  time. 

For  a  very  small  class  this  extra  amount  of  time 
will  ordinarily  more  than  offeset  that  gained  in  scoring, 
but  for  a  class  of  twenty-five,  thirty,  or  forty  pupils, 
this  will  rarely  occur,  and  it  is  likely  that  the  time 
saved  in  scoring  new-type  tests  will  either  balance 
or  more  than  balance  the  extra  amount  required  in  their 
construction.  Even  if  the  total  amount  of  time  required 
is  the  same,  this  should  be  considered  a  merit  of  new- 


99 


type  examinations  "because  a  greater  proportion  of  it 
is  spent  on  eonstruction  and  less  on  scoring.  In  other 
words,  a  teacher  spends  more  time  in  giving  considera- 
tion to  her  general  objectives,  methods,  and  so  forth, 
and  less  in  what  is  largely  drudgery  and  mere  clerical 
work.  Therefore,  the  quality  of  examinations  should  be 
improved  because  of  this  fact. 

It  is  possible  to  prepare  two  or  more  new-type 
tests  over  the  same  subject-matter  which  are  very 
nearly  equivalent  in  difficulty,  a  thing  which  is 
practically  impossible  with  essay  examinations.  If  it 
is  desired  to  give  a  new-type  test  of,  say,  forty 
items,  and  to  have  two  forms  of  the  test,  eighty  items 
should  be  prepared.  These  should  then  be  divided  by 
some  random  or  chance  method  into  two  lists  of  forty 
items  each.  The  two  tests  will,  in  most  eases,  not 
be  of  exactly  the  same  difficulty,  but  it  will  be  un- 
usual if  the  difference  in  difficulty  between  them 
is  more  than  a  very  few  per  cent. 

Lloreover,  in  many  cases  new-type  examinations 
can  be  used  over  again  with  comparitively  slight 
modifications.  Since  the  number  of  items  contained 
is  comparitively  large,  pupils  cannot  expect  to  make 
high  scores  by  studying  a  very  s:.all  portion  of  subject- 
matter,  as  would  be  the  case  if  only  a  few  discussion 


c 


I 


100 


questions  were  to  be  repeated. 

1 

Mitchell  is  of  the  opinion  that  "early  in  the 
term  practice  tests  should  be  provided  to  accustom 
students  to  the  new-type  questions.  The  first  may  be 
a  test  dealing  with  ability  to  follow  directions. 
It  should  be  brief,  not  more  than  ten  items,  and 
can  be  corrected  at  once  by  the  pupils.  The  second 
test  may  well  be  one  of  reading  comprehension." 

3)  Pis-advantages  of  new-type  examinations  

One  of  the  dis-advantages  of  the  new- type 
examination  is  that  it  is  harder  to  make  and  to  give 
than  the  traditional  type.  Since  it  is  composed  of 
more  questions  or  exercises  the  teacher  takes  more 
time  to  prepare  it.  It  is  probady  true,  also,  that 
the  degree  of  mental  effort  required  in  its  construc- 
tion is  more,  although  this  may  not  hold  if  the  per- 
sone  making  it  is  equally  familiar  with  the  new-type 
test,  and  if  the  traditional  examination  constructed 
is  of  as  high  a  degree  of  merit  as  the  other. 

Moreover,  in  most  cases  new-type  tests  generally 
require  that  a  copy  be  placed  in  the  hands  of  each  pupil 
in  order  to  oe  effective.  This  requirement  is  frequently 
a  very  practical  hindrance  to  their  use,  since  it  is 
sometimes  absolutely  impossible,  and  frequently  deci- 
dedly difficult,  for  a  teacher  to  provide  the  necessary 

numDer  of  copies,  ./.any  schools,  especially  small  ones, 

1 .Mitchell, El ene.     Teaching  Values  in  New-Type  History 

Tests,  p. 95 


101 


do  not  have  any  sort  of  device,  such  as  a  mimeograph, 
or  hectograph,  by  which  a  number  of  copies  can  be 
made.  Even  in  the  case  of  schools  which  do  have  such 
devices,  it  is  not  always  easy  to  secure  the  desired 
number  of  copies.  For  a  very  small  class  it  may  be 
practicable  to  use  carbon-copies  made  upon  the  type- 
writer, but  for  classes  of  ordinary  size,  that  is 
hardly  practicable,  requiring  too  great  an  amount  of 
labor. 

Although  much  has  been  said  and  written  concern- 
ing the  new-type  tests,  they  have  been  scarecely 
heard  of  by  many  teachers,  and  are  not  understood  as 
to  purposes,  limitations,  and  administration,  by 
many  others. 

It  seems  probable  that  those  who  favor  the 
traditional  examination  are  correct  in  their  assertion 
that  it  tests  reasoning  and  most  otiier  thought  pro- 
cesses except  memory  better  than  do  new-type  tests. 
The  latter  tend  to  measure  only  knowledge  of  facts 
acquired,  and  that  often  in  rather  disconnected 
fashion.  Such  qualities  and  mental  activities  as 
originality,  initiative,  organization,  interpre- 
tation, anaysis,  discrimination,  judgment,  subtlety, 
and  so  forth,  are,  it  is  said,  only  slightly  if  at 
all  measured  by  new-type  examinations. 


c 


102 


1 

Krey  writes,  "thus  far,  we  have  not  been  able 
to  gain  any  help  from  the  objective  tests  in  arriving 
at  opinions  of  "subjective  qualities"  of  the  students." 

In  .i.ost  varieties  of  the  new-type  examinations 
pupils  are  in  some  form  or  other  given  a  numoer  of 
possible  answers  from  which  to  select  the  correct  ones. 
In  other  words,  they  are  not  thrown  upon  their  own 
resources  to  the  same  extent  as  "by  discussion  ques- 
tions. Knowledge  which  is  only  marginal  or  hazy  is 
frequently  sufficiently  quickened  "by  the  suggested 
answers  that  the  correct  responses  are  recognized, 
whereas  there  is  usually  no  result  of  this  sort  in 
connection  with  the  essay  examination.  The  argument 
may  he  made,  hov/ever,  that  it  is  not  altogether  un- 
desirable that  this  be  the  case,  that  is,  that  some 
tests  secure  more  or  less  suggested  responses.  On 
the  whole  it  appears  that  new-type  tests  are  inferior 
to  discussion  examinations  on  this  point,  but  not  so 
far  inferior  as  has  sometimes  been  asserted. 

A  strong  objection  com. .only  made  to  the  new 

examination  is  that  it  encourages  guessing.  This  is 

especially  charged  against  alternative  tests, in  wnich 

pupils  know  that  they  have  one  chance  out  of  two  of 

i  guessing  right  in  each  particular  ease,  but  also  to 

some  extent  against  multiple-answer  tests,  matching 

tests,  and  several  other  varieties.  It  is  undoubtedly 

1  .Krey ,  A.C.-  :,V/hat  Loes  the  New- Type  Examination 

Measure  in  Kistroy," 
The  Historical  Outlook,  19;  159-162 

April, 1928 


e 


103 


true  that  it  is  easier  for  pupils  to  give  brief  res- 
ponses than  to  write  rather  long  discussions,  if  they 
know  little  or  nothing  about  the  matter  at  issue 
in  either  case.  However,  if  the  tests  are  properly 
administered,  including  satisfactory  directions  for 
the  pupils,  and  also  properly  scored,  it  is  not 
apparent  that  the  amount  of  guessing  which  occurs 
is  great  enough  to  be  a  very  serious  fault.  One  item 
in  satisfactory  directions  should  be  a  statement 
strongly  advising  or  directing  pupils  not  to  guess, 
that  is,  not  to  record  an  answer  unless  they  are  at 
least  fairly  sure  it  is  correct.  However,  if  one 
wishes  to  oDViate  the  possibility  of  pupils  profiting 
by  guessing  in  spite  of  the  ordinary  scoring  methods 

to  be  deseiibed  later,  he  may  well  do  something  of  the 

1 

sort  suggected  by  Christensen  .  This  is,  that  after 

one  type  of  test,  such,  for  example,  as  a  true-false 

one,  has  been  given,  it  be  followed  fairly  soon,  that 

is  within  a  day  or  two,  by  one  of  another  type, perhaps 

multiple-answer,  covering  the  same  material  and  even 

corresponding  iten  for  item  with  the  first  test.  The 

two  tests  should  then  be  scored  together  and  credit 

given  only  for  those  items  correctly  answered  in  both. 

Even  apart  from  its  tendency  to  reduce  guessing,  such 

a  repetition  is  occasionally  desirable. 

1 . Christensen,  A.I«'.-"A  Suggestion  as  to  correcting 

Guessing  in  Examinations." 
Journal  of  Educational  Research 
14:  370-274        Dec. 1926 


c 


104 


An  objection  frequently  urged  against  new-type 
tests  is  that  they,  or  at  least  several  of  the  most 
common  varieties  of  them,  tend  to  confuse  the  pupil 
as  to  what  he  really  knows,  or  even  to  teach  him 
erroneous  facts.  This  charge  is  especially  made 
against  true-false  exercises,  since  practically  half 
of  the  statements  contained  therein  are  false.  It 
is  also  made  against  the  multiple-answer  type,  in 
which  several  of  the  suggested  aiiswers  are  incorrect; 
against  the  matching  type,  in  which  the  pupil  may 
form  wrong  combinations  which  will  tend  to  remain 
in  his  memory;  against  the  incorrect  statement  type 
for  the  same  reason  as  in  the  case  of  the  true-false 
"type,  an(i  so  forth.  In  reply  this  criticism  there 
are  at  least  three  points  to  be  made. 

In  the  first  place,  material  which  the  pupils 
have  not  already  studied  and  supposedly  learned  in 
the  correct  form  should  not  be  covered  or  presented 
in  tests.  If  something  has  already  been  well  learned, 
this  knowledge  will  not  De  disturbed  or  confused  by 
seeing  a  false  statement  concerning  the  matter. 

In  the  second  place,  it  may  be  true  that  some 
confusion  regarding  facts  only  partially  mastered 
may  be  caused,   but  this  should  be  satisfactorily 
taken  care  of  and  corrected  by  the  teacher  in  her 
discussion  of  the  errors  made  upon  the  test. 


c 


105 


Finally,  it  should  be  recognized  that  in  life 
outside  the  school  individuals  are  very  frequently 
called  upon  to  distinguish  true  statements  from 
false  ones,  valid  arguments  from  invalid  ones,  to 
select  the  best  of  several  possibilities,  or  to  do 
something  else  resembling  very  closely  some  variety  or 
other  of  new-type  tests.  It  is  therefore  highly 
desirable  that  some  training  along  these  lines  be 
given  pupils  in  school,  and  it  is  eminently  worth- 
while to  risk  the  confusion  of  ideas  and  knowledge, 
which  may  occur    to  a  limited  degree,  in  the  en- 
deavor to  avoid  much  more  serious  confusion  later 
and  to  develop  critical  ability  in  the  ordinary 
affairs  of  life. 

The  statement  has  been  made  that  objective 
or  near-objective  tests  are  too  artificial  in  that 
they  do   .ot  resemble  life's  situations  or  problems 
in  one  important  particular.  This  is  that  the  pro- 
blems met  in  life  outside  the  school  are  such  that 
there  is  rarely  one  and  only  one  correct  solution 
and  all  others  wrong,  but  instead  there  are  frequent- 
ly several  solutions  of  approximately  equal  merit, 
or  perhaps  several  of  which  one  is  slightly  better 
|  than  another,  the  second  slightly  better  than  a 

third,  and  so  on.  It  is  therefore  argued  that  pupils 
should  not  become  accustomed  to  looking  for  answers 


106 


or  solutions  which  are  aosolutely  right  or  wrong, 
but  s:  ould  be  trained  as  much  as  possible  in  dealing 
with  situations  in  which  all  or  practically  all  of 
the  elements  or  factors  are  subjective.  Although 
there  is  some  truth  in  the  contention  just  stated, 
it  is  not  apparent  that  traditional  examinations 
as  ordinarily  administered  are  of  much  if  any  more 
value  in  giving  training  of  the  kind  desired  than 
are  new-type  tests.  Probably,  if  traditional  ex- 
aminations were  administered  with  this  end  in 
view,  they  could  be  made  to  yield  considerably  greater 
returns  along  this  line  than  is  true  at  present, 
and  also  greater  ones  than  would  come  from  objective 
tests.  On  the  other  hand,  sotoe  varieties  of  the 
latter,  such  as  the  multiple-answer  type  with  several 
answers  of  varying  degrees  of  merit,  do  give  train- 
ing of  the  type  specified  above. 


I 


< 


i 


107 


4)  Five  most  generally  useful  varieties  of  the  New- 
Type  examination  forms  

1.  Single-answer  test. 

directions  The  correct  answer  to  each  question  "below 

is  a  single  word.  If  you  know,  or  think  you  know,  the 
answer  toa  question,  write  it  upon  the  blank  line 
in  front  of  that  question.  Do  not  write  more  than  one 
word  upon  any  line. 

 1.  What  other  term  was  applied  to  the 

'Carpet-baggers"  of  the  North? 

 2.  Y/ho  was  the  author  of  the  declaration 

of  Independence? 

2.  Multiple-answer  test. 

Directions  Each  of  the  questions  below  is  followed 

by  five  suggested  answers  of  which  one  is  right.  If 
you  think  you  know  which  one  is  right,  place  the 
letter  before  it  on  the  short  blank  line  in  front 
of  that  question. 

 1.  The  most  important  modern  state  to 

adopt  a  policy  of  free  trade  was  

A. The  United  States.  B.Russia. 
C. Italy.       D.England.  E.France 

  3.  Of  the  following  men,  the  one  who  is 

famous  as  a  political  pholosopher  is 
A.John  Law.      B.Eli  "hitney.     C .Richard 
Arkwright.        D.John  Locke. 
E.Sir  Walter  Raleight. 

3.  Alternative  Test. 

Direction  Below  are  a  number  of  statements  of 

which  about  half  are  true  and  the  other  half  false. 
In  the  case  of  each  statement  that  you  think  is  true, 
place  a  plus  (i~  )  mark  on  the  blank  line  in  front  of 
it,  and  in  the  case  of  each  that  you  think  is  false, 
place  a  minus (  -)sign.  To  show  just  how  this  is  to 
be  done,  the  first  two  sentences  have  been  marked. 
The  first  one  is  true,  as  it  has  a  plus  sign  in 
front  of  it,  and  the  second  is  false,  so  it  has  a 
minus  sign  before  it.  If  you  do  not  think  you  know 
whether  a  statement  is  true  or  false,  do  not  guess, 
but  omit  it  aid  go  on  to  the  next  one. 


108 


  1.  Under  the  Articles  of  Confederation,  Con- 

gress  had  no  power  to  coiledt  taxes  dir- 
ectly from  individuals. 

  2.  The  acquisition  of  Louisiana  more  than 

tripled  the  territory  of  the  United 
States. 

 3.  President  Wilson  promised  the  Filipinos 

that  the  United  States  would  grant  them 
complete  independence  "before  1930, 

4.  Completion  Test. 

directions   Each  of  the  blanks  in  the  paragraph 

below  represents  the  omission  of  one  word.  If  you 
think  you  know  the  word  that  should  ne  there,  write 
it  on  the  blank.  Do  not  in  a#y  case  write  more  than 
a  single  word  on  one  blank. 

Our  Federal  Government  has  three  branches,  the  

the   ,  and  the   .The  President 

is  at  the  head  of  one  branch,  Congress  at  that  of 

another,  and  the   at  that  of  the  third. 

The  President  is  assisted  by  his  in  which 

there  are  members .  Congress  consists  of 

 Senators  from  each  State  and  Representa- 

tives  whose  number  is  determined  by  the  

of  the  various  States. 


5 .  Matching  Test. 

Directions          Below  is  a  group  of  items,  divided 

into  two  columns.  Each  item  in  the  left-hand  column 
is  numbered.  Each  item  in  the  right-hand  column  is 
followed  by  parentheses.  Place  in  the  parentheses  the 
number  of  that  item  in  the  left-hand  column  that 
matches  with  the  item  in  the  right-hand  column. 


1.  Spanish-American  War 

2.  Organized  labor. 

3.  Sewing  :.achine. 

4.  Mormons. 

5.  Civil  Service  Reform. 

6.  Railroads. 

7.  Southern  Confederacy, 


Brigham  Young  (  ) 

Eli as  Howe        (  ) 

Eames  J.Hill  (  ) 
Alexander  H. 

Stephens  (  ) 

Samuel  Gompers(  ) 


f 


109 


1 

Mitchell  recommends  that  "especially  in  the 
social  studies  it  is  important  to  adapt  the  form 
of  question  to  the  material  used,  and  to  phrase 
questions  to  emphasize  associations  with  other 
facts  instead  of  with  mere  words." 


1.  Mitchell,  Elene,     leaching  Values  in  Hew- Type 

History  Tests.        p.  63 


c 


110 


5)  Conclusion  

Uew-type  tests  are,  as  is  shown  oy  a  consider- 
able mass  of  evidence,  more  reliable  than  traditional 
examinations,   Doth  because  they  secure  oetter  sam- 
plings of  pupils1  ability  and  knowledge,  and  because 
their  scoring  is  relatively  objective. 

Among  the  other  advantages  which  they  possess 
are  that  pupils  usually  prefer  them,  and  are  better 
satisfied  with  the  marks  which  they  receive,  em- 
phasis is  placed  upon  exact  and  accurate  knowledge, 
knowledge  of  the  subject  being  tested  is  measured 
without  being  mixed  with  ability  in  language,  hand- 
writing, and  so  forth,  speed  can  be  measured  when 
desired,  scoring  is  easier,  more  thought  is  usually 
required  in  their  construction,  bluffing  is  more 
difficult,  and  two  or  more  forms  of  practically 
equivalent  difficulty  can  be  prepared. 

Pre-tests  in  the  new-type  form  are  being  used 
for  diagnostic  purposes,  and  the  results  from  these 
tests  are  used  for  the  improvement  of  instruction 
through  diagnosing  the  shortcomings  and  difficulties 
of  pupils. 

Practice  tests  are  being  urged  to  train  the 
pupil  beforehand  in  the  use  of  new-type  tests. 

Workbooks  with  objective  tests  therein  have 
come  greatly  to  the  forefront. 


f 


i 


Ill 


The  five  most  generally  useful  varieties  of 
new-type  test  forms  are:  single-answer;  multiple- 
answer;  alternative;  completion,  and  matching. 


i 


I 


1 


Chapter  8 

Summary 


113 


Chapter  8 
Summary  

The  value  of  educational  measurement  in  general 
is  no  longer  a  matter  of  doubt.  Immeasureable  benefits 
to  all  pupils  and  persons  related  to  the  schools,  in- 
cluding the  community  at  large,  have  come  from  the 
results  of  testing. 

The  past  quarter  of  a  century  has  witnessed 
the  rise  of  educational  measurement  to  the  plane  of 
conscious  striving  for  objective,  impartial  and  com- 
parative means  for  portraying  the  absolute  and  rela- 
tive achievement  of  pupils.  The  measurement  of 
achievement  has  been  admittedly  the  principal  res- 
son  for  examinations.  All  seem  to  be  agreed  that  the 
first  purpose  of  a  test  or  examination  is  that  of 
ascertaining  the  degree  to  which  individual  pupils 
have  profited  by  instruction.  That  purpose  should 
be  attained  by  accurate,  objective  and  impartial 
measurements.  It  should  ever  be  kept  in  mind  that 
it  is  the  pupil  who  is  being  tested  or  measured, 
therefore,  such  factors  as  subjective  judgments 
should  be  reduced  to  a  minimum,  if  the  results  are 
to  be  fair  to  all. 

Scientific  measurement  has  passed  the  experi- 
mental stage.  Because  of  its  importance  all  teachers 


I 


♦ 


114 


should  have  some  knowledge  of  this  subject.  To  "be 
thoroughly  equipped  for  their  positions  they  must 
know  something  of  the  ai.:s,  methods,  materials,  re- 
sults, and  general  educational  implications  of  this 
important  work. 

The  four  principal  and  most  important  types  of 
measurement  in  history  in  the  modern  high  school  are: 
oral  questioning,  the  traditional  examination,  the 
standard  test,  and  the  objective  or  new- type  examina- 
tion. With  so  little  of  a  final  character  known 
about  the  relative  merits  of  the  above  methods,  the 
only  course  open  to  teachers  and  educators  is  to 
consider  both  the  logic  and  the  growing  body  of  ex- 
perimental findings  supporting  or  undermining  the 
value  of  each. 

Although,  as  has  been  stated,  the  first  purpose 
of  a  test  is  that  of  measuring    the  achievement  of 
pupils,  history  tests  in  the  high  school  are  also 
being  used  for  the  purpose  of  promotion,  to  test  the 
efficien  y  of  a  teacher,  to  examine  pupils  for  pur- 
poses of  locating  beginning  points  in  teaching,  for 
determining  school  status,  they  may  promote  review 
and  recall,  they  may  De  used  for  placement  of  pupils 
or  for  classification,  they  may  be  given  in  order 
to  diagnose  difficulties  of  pupils,  and  they  may  in 
that  way  provide  a  basis  for  remedial  measures.  They 


115 


may  be  used,  for  the  motivation  of  school  work,  and 
for  the  promotion  of  real  and  not  artificial  interest. 
History  tests  may  provide  pupils  with  objective  goals 
for  school  study.  They  may  measure  memory.  Most  im- 
portant of  all,  to  the  extent  to  which  their  influ- 
ence may  "be  felt,  history  tests  may  improve  teaching. 

For  many  workers  in  educational  measurements  it 
is  more  worthwhile  to  be  able  to  recognize  a  good 
test  by  its  earmarks  than  to  know  intimately  the 
many  tests,-  good,  mediocre,  and  poor,  -  that  exist 
to-day.  The  chief  criteria  which  should  be  kept  in 
mind  in  selecting  history  tests  in  the  high  school 
are:  validity,  reliaoility,  objectivity,  norms,  dup- 
licate forms,  scaling,  ease  of  administration,  and 
cost.  The  two  fundamental  criteria  for  selecting  a 
history  test  are  validity  and  reliability, -all  others 
are  subordinate.  Objectivity  in  scoring  is  one  im- 
portant factor  in  reliability.  Validity  refers  to 
whether  or  not  a  test  accomplishes  its  purpose.  A 
test  is  valid  when  it  measures  the  ability  or  charac- 
teristic that  it  is  supposed  to  treasure.  A  test  is 
reliable  when  it  measures  whatever  it  does  measure 
accurately;  or,  in  other  words,  if  the  same  results 
are  secured  when  it  is  given  two  or  more  times  to  the 
same  pupils.  Among  the  chief  factors  that  affect  re- 
liability are  objectivity,  the  length  of  a  test,  even- 


116 


ness  of  scaling  of  the  test  elements,  and  the  direc- 
tions for  giving  and  scoring.  A  test  or  score  is 
objective,  if  it  is  not  influenced  by  the  personal 
opinion  or  judgment  of  the  person  doing  the  scoring- 
that  is,  if  all  competent  scorers  agree.  For  actual 
school  practice  lerely  placing  the  items  in  an  order 
of  difficulty  suffices  in  a  comprehensive  objective 
test.  A  test  should  go  to  low  and  to  high  enough 
limits.  Though  not  of  great  theoretical  importance, 
the  ease  of  administering  a  test  and  its  cost  are  of 
practical  importance. 

As  for  the  value  of  oral  questioning  for  mea- 
surement in  history  in  the  high  school,  we  must  con- 
clude that  oral  questioning  is  more  logically  a  part 
of  initial  instruction  than  of  final  measurement. 
Strictly  speaking,  oral  questioning  does  not  usually 
constitute  an  examination.  Oral  examinations  are  some 
times  employed,  but  their  value  for  the  more  serious 
and  final  determination  of  achievement  is  doubted. 
Oral  questioning  is  primarily  instructional;  its 
value  for  measurement  is  more  subordinate. 

Although  there  has  been  much  recent  unfavorabl 
criticism  of  traditional  examinations,  they  should 
not  be  entirely  discarded  in  favor  of  new-type  tests, 
but  should  be  used  on  some  occasions,  ^ach  of  the 


r 


117 


two  general  types  just  mentioned  has  its  peculiar 
merits  and  advantages,  and  should  be  employed  when 
it  iiest  fulfills  the  desired  end*  Traditional  examina- 
tions are  usually  much  easier  to  prepare,  test  a  num- 
ber of  mental  processes  better  than  does  the  new- 
type,  do  not  offer  a  great  opportunity  for  guessing, 
and  perhaps  not  for  cheating,  are  not  as  liable  to 
the  danger  of  confusing  the  pupil,  and  in  several 
minor  ways  are  to  be  preferred.  On  the  other  haild, 
traditional  examinations  are,  as  is  shown  by  a  con- 
siderable mass  of  evidence,  less  reliable  than  the 
nww-type  tests  because  they  do  not  secure  as  good 
samplings  of  pupils'  abaility  and  knowledge,  and  be- 
cause their  scoring  is  relatively  subjective  and  more 
difficult.  Bluffing  is  also  less  difficult.  The 
"traditional"  examination,  also  k  mown  as  the  "essay" 
or  "discussion"  examination,  has  been  almost  exclu- 
sively employed  in  the  high  school  until  the  last  few 
years,  and  still  perhaps  more  usual  than  any  other. 

Standardized  examinations  have  just  completed 
the  first  quarter  century  of  their  existence.  The 
cmrve  of  their  use  is  rising  rapidly,  and  this  use  has 
come  to  be  an  integral  part  of  the  work  of  teachers  in 
many  systems,  both  large  and  small.   In  general  usage, 
the  adjective  "standardized"  or  "standard"  implies 


118 


that  the  test  in  question  has  been  carefully  con- 
structed according  to  certain  general  principles, 
and  embodies  exercises  of  such  forms  that  pupils1 
responses  are  relatively,  if  not  absolutely,  ob- 
jective. It  also  implies  that  the  test  has  been 
widely  enough  given  that  the  results  therefrom  in- 
dicated what  might  be  expected  of  pupils  of  a  given 
age,  grade,  or  other  homogeneous  group. 

It  was  as  a  relief  from  the  admitted  weakness 
of  the  traditional  examination,-  that  of  subjec- 
tivity of  marking,-  that  the  standard  test  was  intro- 
duced. Other  principal  ai.s  are:  to  control  or  stan- 
dardize the  conditions  of  the  examination  period, 
to  provide  norms  or  standards  for  evaluating  and 
interpretating  the  scores  of  individual  pupils.  The 
last  aim  is  the  most  difficult,  and,  on  the  whole, 
decidedly  the  least  important. 

Standardized  tests  have  a  legitimate  place  in 
high  school  measurement  in  history,  and  shuuld  not  De 
wholly  displaced  by  othcfc  types  of  r.easurement .  Cer- 
tain advantages  accrue  irom  the  standardized  tests 
which  the  others  cannot  supply.  They  are  characterized 
by  the  care  exercised  in  their  making,  by  the  objec- 
tivity of  their  scoring,  by  the  comprehensiveness 
within  the  field  measured,  by  the  universal  character 


119 


of  their  content,  and  by  the  fact  that  they  are  ac- 
companied by  norms,  or  standards  of  achievement. 
Standardized  tests  are  generally  more  objective,  re- 
liable, and  valid  than  either  the  traditional  type 
or  the  new-type  examination.  Also,  their  use  saves 
time  in  both  preparation  and  scoring.  Equivalent 
forms  provided  by  most  good  standardized  tests  in- 
sure greater  accuracy  in  measuring  progress.  But 
the  standardized  test  should  not  be  the  sole  element 
in  a  measurement  program.   It  has  its  limitations 
as  well  as  its  advantages.  Among  the  former  is  the 
serious  one  that  standardized  tests  are  not  suffi- 
ciently flexible  to  satisfactorily  meet  local  needs. 
They  are  non-adaptable  to  local  school  curricula. 

The  new  examinations  or,  better,  the  new-type 
test,  is  the  name  commonly  given  to  tests  or  exercises, 
generally  constructed  by  a  teacher  for  her  own  use, 
that  make  use  of  the  forms  and  scoring  methods  of 
standardized  tests  so  as  to  possess  relatively  high 
objectivity,  but  have  not  gone  through  a  process  of 
careful  trying  out  of  material  included.  New-type 
tests  are  more  reliable  than  traditional  examinations, 
both  because  they  secure  better  samplings  of  pupils' 
ability  and  knowledge,  and  because  their  scoring  is 
relatively  objective.  Among  the  other  advantages  which 


r 


120 


they  possess  are  that  pupils  usually  prefer  them,  and 
are  better  satisfied  with  the  marks  which  they  re- 
ceive, emphasis  is  placed  upon  exact  and  accurate 
knowledge,  knowledge  of  the  subject  being  tested  is 
measured  without  being  mixed  with  ability  in  language, 
handy riting,  and  so  forth,  speed  can  be  measured  v/hen 
desired,  scoring  is  easier,  more  thought  is  usually 
required  in  their  construction,  bluffing  is  more 
difficult,  and  two  or  more  forms  of  practically 
equivalent  difficulty  can  be  prepared. 

Pre-tests  in  the  new-type  form  are  being  used 
for  diagnostic  purposes,  and  the  results  from  these 
tests  are  used  for  the  improvement  of  instruction 
through  diagnosing  the  shortcomings  and  difficulties 
of  pupils. 

Practice  tests  are  being  urged  to  train  the 
pupil  beforehand  in  the  use  of  new-type  tests. 

Workbooks  with  objective  tests  therein  have 
come  greatly  to  the  forefront. 

The  five  most  generally  useful  varieties  of 
new-type  tests  are:  single-answer,  multiple-answer, 
alternative,  completion,  and  matching. 

Tne  teaching  of  history  t- -  rough  testing,  in  the 
high  school  should  utilize  a  complete  testing  program 


r 


Such  a  program  should,  generally  consist  of  the  tradi- 
tional examination;  standardized  tests ,  and  the  so- 
called  new-type  examination.  Each  has  its  peculiar 
merits  and  advantages .  and  should  "be  employed  when 
it  "best  fulfills  the  desired  end. 


/ 


> 


122 


Appendix. 

How  To  Secure  Tests  and  Directions  For  Their  Use. 


> 


123 


Appendix 

How  To  Secure  Tests  and  Directions  For  Their  Use. 

Tests  are  changing  at  a  phenomenal  rate  and 
changing  for  the  better.  It  is  the  function  of  fre- 
quent "bulletins  issued  by  Book  Companies  and  bureaus 
of  research  to  inform  educators  of  the  latest  and 
best  tests.  All  the  important  centers  which  dis- 
tribute testing  material  are  prepared  to  send  free 
or  practically  free  literature  describing  their 
tests.  More  than  this,  they  are  glad  to  give  ex- 
pert advice  as  to  the  test  or  tests  which  it  is 
best  to  use  in  a  particular  situation.  Finally, 
they  are  prepared,  for  a  small  charge,  to  send  for 
inspection  sample  tests.  Again,  the  bureaus  which 
issue  tests  usually  fo,  and  always  should,  send 
with  the  tests  which  have  been  ordered,  a  leaflet 
giving  detailed  directions  for  applying  and  scoring 
the  tests,  for  tabulating  results,  and  for  com- 
puting pupil  and  class  scores.  The  directions 
usually  include  norms  for  the  test  and,  frequently, 
suggestions  for  the  uses  of  results.  As  a  precau- 
tion, the  individual,  when  writing  for  tests, should 
request  that  all  necessary  directions  for  properly 
using  them  be  sent. 

In  the  field  of  history  a  numeer  of  tests  are 
now  available  for  yse  in  the  high  school.  It  would 
be  imposible  to  describe  in  aetail  all  available 


( 


124 


history  tests.  The  following  is  a  list  of  the  names 

of  some  available  standardized  history  tests  for  use 
in  the  high  school,  together  with  the  names  of  their 
publishers. 


Barr's  Diagnostic  Tests  in  American  History. 

Public  School  Publishing  Company  (1918) 

Columbia  Research  Bureau  American  History  Test 
World  Book  Company  (1926) 

The  Gregory  American  History  Tests 

C.  A.  Gregory,  University  of  Cincinatti 

(1923) 

Harlan  Test  for  Information  in  American  History 

Public  School  Puolishing  Co  pany  (1917) 

Iowa  General  Information  Test 

Bureau  of  Educational  Research  and 
Service,  University  of  Iowa  (1927) 

Pressey-Riehards  Test  for  the  Understanding  of 
American  History 

Public  School  Publishing  Company  (1922) 

The  Gregory-Owens  Tests  in  Uediaeval  and  Modern  History 
C.  A.  Gregory,  Univ.  of  Cincinatti  (1926) 

Pressey  Tests  of  Historical  Judgment 

Public  Schoul  Publishing  Company  (1924) 

Van  Wagenen  Reading  Scales 

Public  School  Publishing  Company  (1922) 

American  Council  European  History  Test 
World  Book  Company. 

The  ¥/orld  Book  Company,  Yonkers-on-Hudson, 
New  York,  and  the  Public  School  Publishing  Company, 
Bloomington,  Illinois,  are  perhaps  the  largest  pub- 
lishers of  standardized  history  tests. 

The  World  Book  Company  has  just  issued  a  book- 


125 

let  entitled  "Bibliography  of  Tests  for  Use  in 
Schools",  which  sells  for  ten  cents.  This  booklet 
gives  tests  sold  by  other  agencies  than  themselves. 


I 


A  Bibliography  of  Books  and  Sources  Used,  in  part, 


in  the  Organization  of  this  Thesis 


1)  Brinkley,  Sterling  G. 

"Values  of  New-Type  Examinations  in  the 
High  School  with  Special  Reference  to 
History" 

Teachers  College,  Columbia  University 
Contributions  to  Education,  Ho. 161 

2)  Christensen,A.M. 

"A  Suggestion  as  to  Correcting  Guessing 

on  Examinations," 

Journal  of  Educational  Research 

14:  370-374,      Dec.  1926. 

3)  Colvin,     Stephen  S. 

An  Introduction  to  High  School  Teaching  (1922 
The  Macmillan  Company,  New  York. 

4)  Crawford, C.C. ,  and  Raynaldo ,  D.A. 

"Some  Experimental  Comparisons  of  True- 
False  Tests  and  Traditional  Examinations," 
School  Review 

33:   698-706,     Nov.  1925. 

5)  Douglass,  H.R. 

Modern  Methods  in  High  School  Teaching  (1926) 
Houghton  Mifflin  Company,  Boston. 


4 


6)  Elston,  Bertha 

"Improving  the  Teaching  of  History  in  the 
High  School  through  the  use  of  Tests," 
The  Historical  Outlook 

14:     300  -  305,      Nov.  1923 

7)  Everett,  Samuel 

"Objective  Tests  the  Best  Discoverer  of 

Pupil  Mtitud.es," 

The  Historical  Outlook 

20:     335-337,        Nov.  1929 

8)  Hardy,      Ruth  E. 

"New  Types  of  Tests  in  Social  Science," 
The  Historical  Outlook 

14:     326-328,        Nov.  1923 

9)  Hill,        H.  C. 

"The  Use  of  Tests  in  the  Teaching  of  the 

Social  Studies," 

The  Historical  Outlook 

20:     7  -  10,  Jan.  1929 

10)  Kelley,     T.  L. 

:'Note  on  the  Reliaoility  of  a  Test:  a 

Reply  to  Dr. Criticism, " 

Journal  of  Educational  Psychology 

15:     193-204,        April,  1924 

11)  Kepner,  Tyler 

"An  Aspect  of  History  Testing," 
The  Historical  Outlook 

15:     414-417,      Dec.  1924 


• 


• 


12)  Kepner,  P.  T. 

"A  Survey  of  the  Test  Movement  in  History," 
Journal  of  Educational  Research 

7:     309-325,  April,  1983 

13)  Kinder,  J.  S. 

"Supplementing  our  Examinations," 
Education 

45:     557-566,        May,  1925 

14)  Krey,      A.  C. 

"What  -Does  the  New-Type  Examination  Measure 

in  History?" 

The  Historical  Outlook 

19:     159-162,        April,  1928 

15)  Lincoln, Edward  A. 

Beginnings  in  Educational  Measurement  (1924) 
J.B.Lippincott  Company, Chicago ,  111. 

16)  Lindquist,E.F. 

"Factors  Determining  Reliability  of  Test 
Norms , " 

Journal  of  Psychology, 

21:     512-520,        Oct.  1930 

17)  Lindquist ,^.F. ,  and  Anderson,  H.R. 

"Objective  Testing  in  World  History," 
The  Historical  Outlook 

21:     115-122,      iiareh,  1930 

18)  McAfee,  L  .0. 

"The  Reliability  of  Non- Standardized 

Point  Tests," 

Elementary  School  Journal 

24:     579-585,       April,  1924 


• 


19)  Mo  Call,  William  A. 

How  to  Measure  in  Education  (1922) 
The  Macmillan  Company,  New  York. 

20)  Mitchell, Elene 

Teaching  Values  in  New-Type  History  Tests 

(1930) 

World  Book  Company,  Yonkers-on-Hudson, 

New  York 

21)  Monroe,  W.  S. 

"Written  Examinations  and  Their  Improvement," 
f\e  Historical  Outlook 


14:  211-219 
14:  306-318 


June  1923 
Novr.1923 


22)  Moyer,    F.  E. 

"New  Types  of  History  Tests," 
The  Historical  Outlook 

14:     323-324        Novr.  1923 

23)  Odell,     Charles  W. 

Educational  Measurement  in  High  School  (1930) 
The  Century, Company,  New  York. 

24)  Odell,     Charles  W. 

Traditional  Examinations  and  New-Type  Test  s 

(1928) 

The  Century  Company,  New  York 

25)  Parker,  Samuel  C. 

Methods  of  Teaching  in  High  School 

(Revised  Edition  1920) 
Ginn  and  Company,  .boston,  Mass. 

26)  Pressey, Sidney  L.  and  Pressey,Luella  Cole 

Introduction  to  the  Use  of  Standard  Tests 

(1923) 

World  Book  Company,  Yonkers-on-Hudson 

New  York 


r 


27)  Ruoh,      G.  M. 


The  Objective  or  New-Type  Examination 

(1929) 

Scott ,Foresman  and  Company,  Hew  York 

28)  Ruoh,      G.  M. 

The  Improvement  of  the  Written  Examination 

(1924) 

Scott ,Foresman  and  Company,  New  York 

29)  Rueh,      G.M.  and  Stoddard,  George  D. 

Tests  and  Measurements  in  High  School 
Instruction  (1927) 

World  Book  Company,  Yonkers-on-Hudson 

New  York 

30)  Rugg,      E.  M. 

"Character  and  Value  of  Standardized 
Tests  in  History," 
School  Review, 

27:  757-771,        Dec.  1919 

31)  Russell, Charles 

Classroom  Tests  (1926) 

Gi£n  and  Company,  Boston,  Mass. 

32)  Russell, Charles 

Standard  Tests  (1930) 

Ginn  and  Company,  Boston,  Mass. 

33)  Sangren,P.  V. 

"The  Present  Status  of  Measurement  in 
the  Social  Studies," 
The  Historical  Outlook 

21:     279-283,  Oct.  1930 


r 


r 


34)  Starch,  D. ,  and  Elliott,  E.c. 

"Reliability  of  Grading  Y/ork  in  History," 
School  Review 

21:     676-681,        ^ec.  1913 

35)  Symonds,Bereival  M. 

Measurement  in  Secondary  Education  (1927) 
The  Hacmillan  Company,  New  York 

36)  Symonds ,Percival  M. 

"A  Study  of  Extreme  Cases  of  Unreliability" 
Journal  of  Educational  Psychology 

15:     99-106,  Feb.  1924 

37)  Smith,    H.  L. ,  and  Wright,  W.W. 

Tests  and  Measurements  (1928) 

Silver , Bur dett  and  Company,  Boston,  Mass* 

38)  Thorndike,  E.L. ,  Bregman,E.A. ,  and  Cable,  U.V. 

"The  Selections  of  Tasks  of  S^ual  Difficulty 
by  a  Consensus  of  Opinion," 
Journal  of  ^educational  Research 

9:     133-139,      Feb.  1924 

39)  Thornton, 2.  w. 

"The  Use  of  Informational  Tests  in 
American  History  Yeaching," 
The  Historical  Outlook 

20:     12-16,        Ean.  1929 

40)  Trabue,  Marion  Rex 

Measuring  Results  in  Education  (1924) 
American  Book  Company,  Boston,  Mass. 


f 


r 


41)  Tryon,  R.M. 

"Standard  and  New-Type  Tests  in  the  Social 
Studies," 

The  Historical  Outlook 

18;     172-178,        April,  1927 

42)  Wesley,  E.  B. 

"Workbooks  in  the  Social  Studies," 
The  Historical  Outlook 

22:     151-154,        April,  1931 

43)  Wilson,  G-.  M. 

"Criteria  of  a  Standardized  Test," 
Educational  Review 

71:     138-141,        March,  1926 

44)  Wood,      Ben  D. 

"Studies  in  Achievement  Tests," 
Journal  of  Educational  Psychology 

17:     125-139,        Feb.  1926 

45)  Wood,      Ben  D. 

"The  Measurement  of  College  Work," 
Educational  Administration  and  Supervision 
7:     301-334,  Sept.  1921 


• 


» 


